Thanks for looking at that Scott.
It sounds like we need an alternative such as
http://commons.apache.org/codec/apidocs/org/apache/commons/codec/binary/Base
64InputStream.html which does streaming encoding (unlimited size).
Steve
-----Original Message-----
From: Scott Hammel [mailto:sc...@clemson.edu]
Sent: 18 May 2011 21:43
To: Support and info exchange list for Fedora users.
Subject: Re: [fcrepo-user] REST export API negative array index exception
One last thing then I'll quit gabbing so this list doesn't stay as busy as
the Solr user's list :-) just to summarize:
I poked into the source for Apache Commons base64 codec 1.3 at the line
indicated in my error logs: it's a line in encodeBase64() where a byte array
is allocated for storage. To compute the array size to allocate, the method
multiplies the size of the incoming binary data array by 8. So 300 MB => 300
* 2^20 * 8 which is > max int, I do believe.
Next limit is the practical limit of JVM RAM on a 32-bit server: really
slightly less than 2 GB.
Next limit is the fact that it looks like a ByteArrayOutputStream uses a
byte array as a buffer, and the limit is max int again, but this time for #
of bytes in the datastream (appx 2GB).
Scott
On 05/18/2011 03:53 PM, Scott Hammel wrote:
I see where you are going :-)
I just ran a 400MB test with an ATOMZip export. Seems to have worked
just fine.
A 900MB datastream export to ATOMZip test failed. No exception generated
in the logs, just an internal server error. I noticed with 3.4.2 this
can indicate the JVM ran out memory (not surprising if the export is
still being collected into a ByteArrayOutputStream, I guess).
Scott
On 05/18/2011 11:53 AM, Stephen Bayliss wrote:
Hi Scott
Thanks for that feedback.
It would be interesting to find out if you get the same problem using the
AtomZip export format (info:fedora/fedora-system:ATOMZip-1.1)
Steve
-----Original Message-----
From: Scott Hammel [mailto:sc...@clemson.edu]
Sent: 18 May 2011 16:16
To: Support and info exchange list for Fedora users.
Subject: Re: [fcrepo-user] REST export API negative array
index exception
Scott, Steve,
REST export in archive format still blows up with Fedora
3.4.2. Actually
is crashing on a datastream< 300MB. I gave the JVM 1.5GB of
heap, BTW.
Regardless, the exception that is in fedora.log is a negative array
index exception. It looks like it is actually occurring down in the
base64 encoder according to the stack trace.
It occurs to me that building support for a full archival
export of an
object in memory for arbitrarily large objects might be pragmatically
(is that a word?) impossible: e.g., on 32-bit systems I think
you bump
into problems giving the JVM more than ~1.8 GB of RAM. That
alone limits
the size of exportable objects to well under 2GB in that environment.
If I was more adept with Java, I'd volunteer to write an
exporter that
spooled to disk, but alas, I am not and it would take me
twice as long
as someone who is. :-(
I can take one of several alternative paths with my
particular project,
so it isn't too big an issue to *me* .... I just have to do a little
more coding in a middle-tier. Don't know about other folks, of course.
-Scott
On 05/18/2011 01:08 AM, Stephen Bayliss wrote:
Looking at those lines of code it looks like in theory
there would be
a problem there. Once this is confirmed we should probably
add a test
case to the large datastreams test suite. And it is likely
to cause a
problem with datastreams smaller than 2GB (2^31-1 as maximum array
index) due to the archive export base64-encoding the content.
-----Original Message-----
From: Scott Prater [mailto:pra...@wisc.edu]
Sent: 17 May 2011 18:33
To: Support and info exchange list for Fedora users.
Subject: Re: [fcrepo-user] REST export API negative array index
exception
Yes, trying with the latest stable version (3.4.2) would
be useful,
if you don't mind. There were some lowlevel garbage collection
problems that were fixed in the 3.4.2 release; these problems
manifested themselves in a variety of ways.
I'm not saying this is the issue, but it wouldn't hurt to
verify that
your problem can be reproduced in 3.4.2.
thanks,
-- Scott
On 05/17/2011 12:22 PM, Scott Hammel wrote:
I'm pretty sure it is 3.4.0 (from files on the server it
looks like an
August 2010 build. The server is in a totally isolated
network with
nothing with GUI support that can hit the admin tools).
Tomcat is the version bundled with the Fedora installer.
Would you like me to be sure I'm running at the latest
version and try
the test scripts again before you go forward?
Scott
On 05/17/2011 12:45 PM, Scott Prater wrote:
Thanks, Scott. I'll try to reproduce the problem in my
environment,
Fedora 3.4.2.
Can you tell me what version of Fedora and Tomcat (or
other webapp
server) you're using?
-- Scott
On 05/17/2011 11:08 AM, Scott Hammel wrote:
Hey, Scott,
Thanks for responding. I'm more a C/C++ programmer and
not a Java
programmer (though I sometimes play one on the
Internet), so I'm
just guessing on the array bounds -- feels like something
incrementing an int into the sign bit, though I'd think
Java would
throw some array bounds exception before that happened.
Figured I'd
do a little math later maybe to test my hypothesis.
Recall, this was all in a 32-bit environment. I really
hope it is a
non-issue and something I'm doing in the end. Note
disseminating the
datastream content directly appears to work OK, which
confuses me a
little, though I haven't looked to see if the code for
that does
things differently.
Anyway, here's a series of commands (extracted from my
test scripts)
that should reproduce the problem:
mkdir /usr/fedora/tomcat/webapps/ROOT/ingestpool
mkdir /tmp/fedrun
dir=/tmp/fedrun
pid=test:pid01
dd if=/dev/urandom
of=/usr/fedora/tomcat/webapps/ROOT/ingestpool/sample.bin bs=1M
count=400
./makefoxml $pid http://localhost:8080/ingestpool/sample.bin>
$dir/sample.xml
/usr/fedora/client/bin/fedora-ingest.sh f $dir/sample.xml
info:fedora/fedora-system:FOXML-1.1 localhost:8080
fedoraAdmin<insert
pwd here> http
wget -O $dir/export.xml --auth-no-challenge
--http-user=fedoraAdmin
--http-password=<insert pwd here>
http://localhost:8080/fedora/objects/$pid/export?context=archive
Note: I use the REST call via a wget rather than the
provided export
client scripts because it looks to me from the Java heap
explosion
that the export scripts must end up doing the export
via the SOAP
API.
--
The content of makefoxml:
#!/bin/bash
#usage: makefoxml<pid> <refurl>
#escape slashes off the URL
RF=${2//\//\\/}
#if you need to escape ampersands as well, uncomment this:
#RF=${RF//'&'/'\&'}
# make substitutions ....
sed '
s/PID=""/PID="'"$1"'"/
s/rdf:about=""/rdf:about="info:fedora\/'" <info:fedora\/'> $1"'"/
s/dc:identifier>/dc:identifier>'"$1"'/
s/REF=""/REF="'"${RF}"'"/
'< "foxml_tpl.xml"
--
The content of foxml_tmp.xml (the sed script above does
the edits
noted in the xml comments in this template):
<?xml version="1.0" encoding="UTF-8"?>
<!-- following element: set the PID attribute -->
<foxml:digitalObject VERSION="1.1" PID=""
xmlns:foxml="info:fedora/fedora-system:def/foxml#"
<info:fedora/fedora-system:def/foxml#>
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
<http://www.w3.org/2001/XMLSchema-instance>
xsi:schemaLocation="info:fedora/fedora-system:def/foxml#
<info:fedora/fedora-system:def/foxml#http://www.fedora.info/definitions/1/0/
foxml1-1.xsd>
http://www.fedora.info/definitions/1/0/foxml1-1.xsd">
<foxml:objectProperties>
<foxml:property NAME="info:fedora/fedora-system:def/model#state"
<info:fedora/fedora-system:def/model#state>
VALUE="A"/> <foxml:property
NAME="info:fedora/fedora-system:def/model#label"
<info:fedora/fedora-system:def/model#label> VALUE=""/>
<foxml:property
NAME="info:fedora/fedora-system:def/model#ownerId"
<info:fedora/fedora-system:def/model#ownerId>
VALUE="fedoraAdmin"/>
</foxml:objectProperties>
<foxml:datastream CONTROL_GROUP="X" ID="RELS-EXT">
<foxml:datastreamVersion
FORMAT_URI="info:fedora/fedora-system:FedoraRELSExt-1.0"
<info:fedora/fedora-system:FedoraRELSExt-1.0>
ID="RELS-EXT.0" LABEL="RDF Statements about
this Object"
MIMETYPE="application/rdf+xml"> <foxml:xmlContent> <rdf:RDF
xmlns:dc="http://purl.org/dc/elements/1.1/"
<http://purl.org/dc/elements/1.1/>
xmlns:fedora="info:fedora/fedora-system:def/relations-external#"
<info:fedora/fedora-system:def/relations-external#>
xmlns:fedora-model="info:fedora/fedora-system:def/model#"
<info:fedora/fedora-system:def/model#>
xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
<http://www.openarchives.org/OAI/2.0/oai_dc/>
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
<http://www.w3.org/2000/01/rdf-schema#> >
<!-- following element: put the PID as the value for
the rdf:about
attribute --> <rdf:description rdf:about="">
</rdf:description>
</rdf:RDF>
</foxml:xmlContent>
</foxml:datastreamVersion>
</foxml:datastream>
<foxml:datastream CONTROL_GROUP="X" ID="DC" STATE="A"
VERSIONABLE="true"> <foxml:datastreamVersion ID="DC.0"
LABEL="Dublin
Core Record" MIMETYPE="text/xml"> <foxml:xmlContent>
<oai_dc:dc
xmlns:dc="http://purl.org/dc/elements/1.1/"
<http://purl.org/dc/elements/1.1/>
xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
<http://www.openarchives.org/OAI/2.0/oai_dc/>
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
<http://www.w3.org/2001/XMLSchema-instance>
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/
<http://www.openarchives.org/OAI/2.0/oai_dc/http://www.openarchives.org/OAI/
2.0/oai_dc.xsd>
http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title></dc:title>
<dc:creator>Test Program</dc:creator>
<dc:description>A test object</dc:description>
<!-- following element: put the PID between the tags -->
<dc:identifier></dc:identifier> </oai_dc:dc>
</foxml:xmlContent>
</foxml:datastreamVersion>
</foxml:datastream>
<foxml:datastream CONTROL_GROUP="M" ID="Content" STATE="A">
<foxml:datastreamVersion ID="Content.0" LABEL="This is
the object
content" MIMETYPE=" application/octet-stream">
<!-- following element: put the URL to the content file
as the value
for the REF attribute -->
<!-- must be an http URL, e.g.,
http://localhost:8080/ingestpool/foxmldoc.xml -->
<!-- I just create a directory "ingestpool" under
/usr/fedora/tomcat/webapps/ROOT and put the files there -->
<foxml:contentLocation REF="" TYPE="URL" />
</foxml:datastreamVersion> </foxml:datastream>
</foxml:digitalObject>
On 05/17/2011 10:00 AM, Scott Prater wrote:
Scott,
Can you come up with a test case that confirms this
limitation? If
you can provide one, I'll open up a JIRA ticket for the issue.
thanks,
-- Scott
On 05/16/2011 10:45 AM, Scott Hammel wrote:
Oh, I think I see: last line of the serializer's serialize
function does
this:
bytes.toByteArray()
where bytes is a ByteArrayOutputStream
I *think* the max size of an array index in Java (32-bit) is
2,147,483,647 (i.e., 2^31 - 1, max value of a java
int). So, this
function will throw an exception if a datastream
"archive" export
is> ~2 GB.
scott
On 05/16/2011 11:00 AM, Scott Hammel wrote:
Hi,
Running some export tests using Fedora's REST export
API, I get a
negative array index Java exception when doing an
"archive" export of an
object at around 400 MB (>320 MB,< 450 MB).
Fedora is version 3.4 something; running on 32-bit
CentOS 5.5,
Sun Java 1.6, 21
Is it just me or has anyone else seen something like that?
Thanks,
Scott
-----------------------------------------------------------------
-------------
Achieve unprecedented app performance and reliability What
every C/C++ and Fortran developer should know. Learn
how Intel
has extended the reach of its
next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
------------------------------------------------------------------
------------
Achieve unprecedented app performance and reliability
What every
C/C++ and Fortran developer should know. Learn how Intel has
extended the reach of its
next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
--------------------------------------------------------------------
----------
Achieve unprecedented app performance and reliability
What every
C/C++ and Fortran developer should know. Learn how Intel has
extended the reach of its
next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
---------------------------------------------------------------------
-
--------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its
next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
--
Scott Prater
Library, Instructional, and Research Applications (LIRA)
Division of Information Technology (DoIT) University of
Wisconsin - Madison pra...@wisc.edu
--------------------------------------------------------------
----------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its
next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
----------------------------------------------------------------------
--------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
--------------------------------------------------------------
----------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
----------------------------------------------------------------------------
--
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
----------------------------------------------------------------------------
--
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users