Hey, Scott, Thanks for responding. I'm more a C/C++ programmer and not a Java programmer (though I sometimes play one on the Internet), so I'm just guessing on the array bounds -- feels like something incrementing an int into the sign bit, though I'd think Java would throw some array bounds exception before that happened. Figured I'd do a little math later maybe to test my hypothesis.
Recall, this was all in a 32-bit environment. I really hope it is a non-issue and something I'm doing in the end. Note disseminating the datastream content directly appears to work OK, which confuses me a little, though I haven't looked to see if the code for that does things differently. Anyway, here's a series of commands (extracted from my test scripts) that should reproduce the problem: mkdir /usr/fedora/tomcat/webapps/ROOT/ingestpool mkdir /tmp/fedrun dir=/tmp/fedrun pid=test:pid01 dd if=/dev/urandom of=/usr/fedora/tomcat/webapps/ROOT/ingestpool/sample.bin bs=1M count=400 ./makefoxml $pid http://localhost:8080/ingestpool/sample.bin > $dir/sample.xml /usr/fedora/client/bin/fedora-ingest.sh f $dir/sample.xml info:fedora/fedora-system:FOXML-1.1 localhost:8080 fedoraAdmin <insert pwd here> http wget -O $dir/export.xml --auth-no-challenge --http-user=fedoraAdmin --http-password=<insert pwd here> http://localhost:8080/fedora/objects/$pid/export?context=archive Note: I use the REST call via a wget rather than the provided export client scripts because it looks to me from the Java heap explosion that the export scripts must end up doing the export via the SOAP API. -- The content of makefoxml: #!/bin/bash #usage: makefoxml <pid> <refurl> #escape slashes off the URL RF=${2//\//\\/} #if you need to escape ampersands as well, uncomment this: #RF=${RF//'&'/'\&'} # make substitutions .... sed ' s/PID=""/PID="'"$1"'"/ s/rdf:about=""/rdf:about="info:fedora\/'"$1"'"/ s/dc:identifier>/dc:identifier>'"$1"'/ s/REF=""/REF="'"${RF}"'"/ ' < "foxml_tpl.xml" -- The content of foxml_tmp.xml (the sed script above does the edits noted in the xml comments in this template): <?xml version="1.0" encoding="UTF-8"?> <!-- following element: set the PID attribute --> <foxml:digitalObject VERSION="1.1" PID="" xmlns:foxml="info:fedora/fedora-system:def/foxml#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="info:fedora/fedora-system:def/foxml# http://www.fedora.info/definitions/1/0/foxml1-1.xsd"> <foxml:objectProperties> <foxml:property NAME="info:fedora/fedora-system:def/model#state" VALUE="A"/> <foxml:property NAME="info:fedora/fedora-system:def/model#label" VALUE=""/> <foxml:property NAME="info:fedora/fedora-system:def/model#ownerId" VALUE="fedoraAdmin"/> </foxml:objectProperties> <foxml:datastream CONTROL_GROUP="X" ID="RELS-EXT"> <foxml:datastreamVersion FORMAT_URI="info:fedora/fedora-system:FedoraRELSExt-1.0" ID="RELS-EXT.0" LABEL="RDF Statements about this Object" MIMETYPE="application/rdf+xml"> <foxml:xmlContent> <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:fedora="info:fedora/fedora-system:def/relations-external#" xmlns:fedora-model="info:fedora/fedora-system:def/model#" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"> <!-- following element: put the PID as the value for the rdf:about attribute --> <rdf:description rdf:about=""> </rdf:description> </rdf:RDF> </foxml:xmlContent> </foxml:datastreamVersion> </foxml:datastream> <foxml:datastream CONTROL_GROUP="X" ID="DC" STATE="A" VERSIONABLE="true"> <foxml:datastreamVersion ID="DC.0" LABEL="Dublin Core Record" MIMETYPE="text/xml"> <foxml:xmlContent> <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"> <dc:title></dc:title> <dc:creator>Test Program</dc:creator> <dc:description>A test object</dc:description> <!-- following element: put the PID between the tags --> <dc:identifier></dc:identifier> </oai_dc:dc> </foxml:xmlContent> </foxml:datastreamVersion> </foxml:datastream> <foxml:datastream CONTROL_GROUP="M" ID="Content" STATE="A"> <foxml:datastreamVersion ID="Content.0" LABEL="This is the object content" MIMETYPE=" application/octet-stream"> <!-- following element: put the URL to the content file as the value for the REF attribute --> <!-- must be an http URL, e.g., http://localhost:8080/ingestpool/foxmldoc.xml --> <!-- I just create a directory "ingestpool" under /usr/fedora/tomcat/webapps/ROOT and put the files there --> <foxml:contentLocation REF="" TYPE="URL" /> </foxml:datastreamVersion> </foxml:datastream> </foxml:digitalObject> On 05/17/2011 10:00 AM, Scott Prater wrote: > Scott, > > Can you come up with a test case that confirms this limitation? If you > can provide one, I'll open up a JIRA ticket for the issue. > > thanks, > > -- Scott > > On 05/16/2011 10:45 AM, Scott Hammel wrote: >> Oh, I think I see: last line of the serializer's serialize function does >> this: >> bytes.toByteArray() >> where bytes is a ByteArrayOutputStream >> >> I *think* the max size of an array index in Java (32-bit) is >> 2,147,483,647 (i.e., 2^31 - 1, max value of a java int). So, this >> function will throw an exception if a datastream "archive" export is> >> ~2 GB. >> >> scott >> >> On 05/16/2011 11:00 AM, Scott Hammel wrote: >>> Hi, >>> >>> Running some export tests using Fedora's REST export API, I get a >>> negative array index Java exception when doing an "archive" export of an >>> object at around 400 MB (>320 MB,< 450 MB). >>> >>> Fedora is version 3.4 something; running on 32-bit CentOS 5.5, Sun Java >>> 1.6, 21 >>> >>> Is it just me or has anyone else seen something like that? >>> >>> Thanks, >>> Scott >>> >>> ------------------------------------------------------------------------------ >>> Achieve unprecedented app performance and reliability >>> What every C/C++ and Fortran developer should know. >>> Learn how Intel has extended the reach of its next-generation tools >>> to help boost performance applications - inlcuding clusters. >>> http://p.sf.net/sfu/intel-dev2devmay >>> _______________________________________________ >>> Fedora-commons-users mailing list >>> Fedora-commons-users@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users >>> >> >> ------------------------------------------------------------------------------ >> Achieve unprecedented app performance and reliability >> What every C/C++ and Fortran developer should know. >> Learn how Intel has extended the reach of its next-generation tools >> to help boost performance applications - inlcuding clusters. >> http://p.sf.net/sfu/intel-dev2devmay >> _______________________________________________ >> Fedora-commons-users mailing list >> Fedora-commons-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users > ------------------------------------------------------------------------------ Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay _______________________________________________ Fedora-commons-users mailing list Fedora-commons-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-users