Scott, Steve, REST export in archive format still blows up with Fedora 3.4.2. Actually is crashing on a datastream < 300MB. I gave the JVM 1.5GB of heap, BTW.
Regardless, the exception that is in fedora.log is a negative array index exception. It looks like it is actually occurring down in the base64 encoder according to the stack trace. It occurs to me that building support for a full archival export of an object in memory for arbitrarily large objects might be pragmatically (is that a word?) impossible: e.g., on 32-bit systems I think you bump into problems giving the JVM more than ~1.8 GB of RAM. That alone limits the size of exportable objects to well under 2GB in that environment. If I was more adept with Java, I'd volunteer to write an exporter that spooled to disk, but alas, I am not and it would take me twice as long as someone who is. :-( I can take one of several alternative paths with my particular project, so it isn't too big an issue to *me* .... I just have to do a little more coding in a middle-tier. Don't know about other folks, of course. -Scott On 05/18/2011 01:08 AM, Stephen Bayliss wrote: > Looking at those lines of code it looks like in theory there would be a > problem there. Once this is confirmed we should probably add a test case to > the large datastreams test suite. And it is likely to cause a problem with > datastreams smaller than 2GB (2^31-1 as maximum array index) due to the > archive export base64-encoding the content. > >> -----Original Message----- >> From: Scott Prater [mailto:pra...@wisc.edu] >> Sent: 17 May 2011 18:33 >> To: Support and info exchange list for Fedora users. >> Subject: Re: [fcrepo-user] REST export API negative array >> index exception >> >> >> Yes, trying with the latest stable version (3.4.2) would be >> useful, if >> you don't mind. There were some lowlevel garbage collection problems >> that were fixed in the 3.4.2 release; these problems manifested >> themselves in a variety of ways. >> >> I'm not saying this is the issue, but it wouldn't hurt to verify that >> your problem can be reproduced in 3.4.2. >> >> thanks, >> >> -- Scott >> >> On 05/17/2011 12:22 PM, Scott Hammel wrote: >>> I'm pretty sure it is 3.4.0 (from files on the server it >> looks like an >>> August 2010 build. The server is in a totally isolated network with >>> nothing with GUI support that can hit the admin tools). >>> >>> Tomcat is the version bundled with the Fedora installer. >>> >>> Would you like me to be sure I'm running at the latest >> version and try >>> the test scripts again before you go forward? >>> >>> Scott >>> >>> On 05/17/2011 12:45 PM, Scott Prater wrote: >>>> Thanks, Scott. I'll try to reproduce the problem in my >> environment, >>>> Fedora 3.4.2. >>>> >>>> Can you tell me what version of Fedora and Tomcat (or other webapp >>>> server) you're using? >>>> >>>> -- Scott >>>> >>>> On 05/17/2011 11:08 AM, Scott Hammel wrote: >>>>> Hey, Scott, >>>>> >>>>> Thanks for responding. I'm more a C/C++ programmer and not a Java >>>>> programmer (though I sometimes play one on the Internet), so I'm >>>>> just guessing on the array bounds -- feels like something >>>>> incrementing an int into the sign bit, though I'd think >> Java would >>>>> throw some array bounds exception before that happened. >> Figured I'd >>>>> do a little math later maybe to test my hypothesis. >>>>> >>>>> Recall, this was all in a 32-bit environment. I really >> hope it is a >>>>> non-issue and something I'm doing in the end. Note >> disseminating the >>>>> datastream content directly appears to work OK, which >> confuses me a >>>>> little, though I haven't looked to see if the code for that does >>>>> things differently. >>>>> >>>>> Anyway, here's a series of commands (extracted from my >> test scripts) >>>>> that should reproduce the problem: >>>>> >>>>> mkdir /usr/fedora/tomcat/webapps/ROOT/ingestpool >>>>> mkdir /tmp/fedrun >>>>> dir=/tmp/fedrun >>>>> pid=test:pid01 >>>>> >>>>> dd if=/dev/urandom >>>>> of=/usr/fedora/tomcat/webapps/ROOT/ingestpool/sample.bin bs=1M >>>>> count=400 >>>>> >>>>> ./makefoxml $pid http://localhost:8080/ingestpool/sample.bin> >>>>> $dir/sample.xml >>>>> >>>>> /usr/fedora/client/bin/fedora-ingest.sh f $dir/sample.xml >>>>> info:fedora/fedora-system:FOXML-1.1 localhost:8080 >> fedoraAdmin<insert >>>>> pwd here> http >>>>> >>>>> wget -O $dir/export.xml --auth-no-challenge >> --http-user=fedoraAdmin >>>>> --http-password=<insert pwd here> >>>>> http://localhost:8080/fedora/objects/$pid/export?context=archive >>>>> >>>>> Note: I use the REST call via a wget rather than the >> provided export >>>>> client scripts because it looks to me from the Java heap >> explosion >>>>> that the export scripts must end up doing the export via the SOAP >>>>> API. >>>>> -- >>>>> The content of makefoxml: >>>>> >>>>> #!/bin/bash >>>>> >>>>> #usage: makefoxml<pid> <refurl> >>>>> #escape slashes off the URL >>>>> RF=${2//\//\\/} >>>>> #if you need to escape ampersands as well, uncomment this: >>>>> #RF=${RF//'&'/'\&'} >>>>> >>>>> # make substitutions .... >>>>> sed ' >>>>> s/PID=""/PID="'"$1"'"/ >>>>> s/rdf:about=""/rdf:about="info:fedora\/'"$1"'"/ >>>>> s/dc:identifier>/dc:identifier>'"$1"'/ >>>>> s/REF=""/REF="'"${RF}"'"/ >>>>> '< "foxml_tpl.xml" >>>>> >>>>> -- >>>>> The content of foxml_tmp.xml (the sed script above does the edits >>>>> noted in the xml comments in this template): >>>>> >>>>> <?xml version="1.0" encoding="UTF-8"?> >>>>> <!-- following element: set the PID attribute --> >>>>> <foxml:digitalObject VERSION="1.1" PID="" >>>>> xmlns:foxml="info:fedora/fedora-system:def/foxml#" >>>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >>>>> xsi:schemaLocation="info:fedora/fedora-system:def/foxml# >>>>> http://www.fedora.info/definitions/1/0/foxml1-1.xsd"> >>>>> >>>>> <foxml:objectProperties> >>>>> <foxml:property NAME="info:fedora/fedora-system:def/model#state" >>>>> VALUE="A"/> <foxml:property >>>>> NAME="info:fedora/fedora-system:def/model#label" VALUE=""/> >>>>> <foxml:property NAME="info:fedora/fedora-system:def/model#ownerId" >>>>> VALUE="fedoraAdmin"/> >>>>> </foxml:objectProperties> >>>>> >>>>> <foxml:datastream CONTROL_GROUP="X" ID="RELS-EXT"> >>>>> <foxml:datastreamVersion >>>>> FORMAT_URI="info:fedora/fedora-system:FedoraRELSExt-1.0" >>>>> ID="RELS-EXT.0" LABEL="RDF Statements about >> this Object" >>>>> MIMETYPE="application/rdf+xml"> <foxml:xmlContent> >>>>> <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" >>>>> >> xmlns:fedora="info:fedora/fedora-system:def/relations-external#" >>>>> >> xmlns:fedora-model="info:fedora/fedora-system:def/model#" >>>>> >> xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" >>>>> >> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" >>>>> xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"> >>>>> <!-- following element: put the PID as the value for the rdf:about >>>>> attribute --> >>>>> <rdf:description rdf:about=""> >>>>> </rdf:description> >>>>> </rdf:RDF> >>>>> </foxml:xmlContent> >>>>> </foxml:datastreamVersion> >>>>> </foxml:datastream> >>>>> >>>>> <foxml:datastream CONTROL_GROUP="X" ID="DC" STATE="A" >>>>> VERSIONABLE="true"> <foxml:datastreamVersion ID="DC.0" >> LABEL="Dublin >>>>> Core Record" MIMETYPE="text/xml"> <foxml:xmlContent> >>>>> <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" >>>>> >> xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" >>>>> >> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >>>>> xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ >>>>> http://www.openarchives.org/OAI/2.0/oai_dc.xsd"> >>>>> <dc:title></dc:title> >>>>> <dc:creator>Test Program</dc:creator> >>>>> <dc:description>A test object</dc:description> >>>>> <!-- following element: put the PID between the tags --> >>>>> <dc:identifier></dc:identifier> >>>>> </oai_dc:dc> >>>>> </foxml:xmlContent> >>>>> </foxml:datastreamVersion> >>>>> </foxml:datastream> >>>>> >>>>> <foxml:datastream CONTROL_GROUP="M" ID="Content" STATE="A"> >>>>> <foxml:datastreamVersion ID="Content.0" LABEL="This is the object >>>>> content" MIMETYPE=" application/octet-stream"> >>>>> <!-- following element: put the URL to the content file >> as the value >>>>> for the REF attribute --> >>>>> <!-- must be an http URL, e.g., >>>>> http://localhost:8080/ingestpool/foxmldoc.xml --> >>>>> <!-- I just create a directory "ingestpool" under >>>>> /usr/fedora/tomcat/webapps/ROOT and put the files there --> >>>>> <foxml:contentLocation REF="" TYPE="URL" /> >>>>> </foxml:datastreamVersion> </foxml:datastream> >>>>> >>>>> >>>>> </foxml:digitalObject> >>>>> >>>>> >>>>> >>>>> On 05/17/2011 10:00 AM, Scott Prater wrote: >>>>>> Scott, >>>>>> >>>>>> Can you come up with a test case that confirms this >> limitation? If >>>>>> you can provide one, I'll open up a JIRA ticket for the issue. >>>>>> >>>>>> thanks, >>>>>> >>>>>> -- Scott >>>>>> >>>>>> On 05/16/2011 10:45 AM, Scott Hammel wrote: >>>>>>> Oh, I think I see: last line of the serializer's serialize >>>>>>> function does >>>>>>> this: >>>>>>> bytes.toByteArray() >>>>>>> where bytes is a ByteArrayOutputStream >>>>>>> >>>>>>> I *think* the max size of an array index in Java (32-bit) is >>>>>>> 2,147,483,647 (i.e., 2^31 - 1, max value of a java >> int). So, this >>>>>>> function will throw an exception if a datastream >> "archive" export >>>>>>> is> ~2 GB. >>>>>>> >>>>>>> scott >>>>>>> >>>>>>> On 05/16/2011 11:00 AM, Scott Hammel wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Running some export tests using Fedora's REST export >> API, I get a >>>>>>>> negative array index Java exception when doing an >> "archive" export of an >>>>>>>> object at around 400 MB (>320 MB,< 450 MB). >>>>>>>> >>>>>>>> Fedora is version 3.4 something; running on 32-bit CentOS 5.5, >>>>>>>> Sun Java 1.6, 21 >>>>>>>> >>>>>>>> Is it just me or has anyone else seen something like that? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Scott >>>>>>>> >>>>>>>> >> ----------------------------------------------------------------- >>>>>>>> ------------- >>>>>>>> Achieve unprecedented app performance and reliability >>>>>>>> What every C/C++ and Fortran developer should know. >>>>>>>> Learn how Intel has extended the reach of its >> next-generation tools >>>>>>>> to help boost performance applications - inlcuding clusters. >>>>>>>> http://p.sf.net/sfu/intel-dev2devmay >>>>>>>> _______________________________________________ >>>>>>>> Fedora-commons-users mailing list >>>>>>>> Fedora-commons-users@lists.sourceforge.net >>>>>>>> >> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users >> ------------------------------------------------------------------ >>>>>>> ------------ >>>>>>> Achieve unprecedented app performance and reliability >>>>>>> What every C/C++ and Fortran developer should know. >>>>>>> Learn how Intel has extended the reach of its >> next-generation tools >>>>>>> to help boost performance applications - inlcuding clusters. >>>>>>> http://p.sf.net/sfu/intel-dev2devmay >>>>>>> _______________________________________________ >>>>>>> Fedora-commons-users mailing list >>>>>>> Fedora-commons-users@lists.sourceforge.net >>>>>>> >> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users >>>>> >> -------------------------------------------------------------------- >>>>> ---------- >>>>> Achieve unprecedented app performance and reliability >>>>> What every C/C++ and Fortran developer should know. >>>>> Learn how Intel has extended the reach of its >> next-generation tools >>>>> to help boost performance applications - inlcuding clusters. >>>>> http://p.sf.net/sfu/intel-dev2devmay >>>>> _______________________________________________ >>>>> Fedora-commons-users mailing list >>>>> Fedora-commons-users@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users >>> >>> >> ---------------------------------------------------------------------- >>> -------- >>> Achieve unprecedented app performance and reliability >>> What every C/C++ and Fortran developer should know. >>> Learn how Intel has extended the reach of its next-generation tools >>> to help boost performance applications - inlcuding clusters. >>> http://p.sf.net/sfu/intel-dev2devmay >>> _______________________________________________ >>> Fedora-commons-users mailing list >>> Fedora-commons-users@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users >> >> -- >> Scott Prater >> Library, Instructional, and Research Applications (LIRA) >> Division of Information Technology (DoIT) University of >> Wisconsin - Madison pra...@wisc.edu >> >> -------------------------------------------------------------- >> ---------------- >> Achieve unprecedented app performance and reliability >> What every C/C++ and Fortran developer should know. >> Learn how Intel has extended the reach of its next-generation >> tools to help boost performance applications - inlcuding >> clusters. http://p.sf.net/sfu/intel-dev2devmay >> _______________________________________________ >> Fedora-commons-users mailing list >> Fedora-commons-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users >> > > ------------------------------------------------------------------------------ > What Every C/C++ and Fortran developer Should Know! > Read this article and learn how Intel has extended the reach of its > next-generation tools to help Windows* and Linux* C/C++ and Fortran > developers boost performance applications - including clusters. > http://p.sf.net/sfu/intel-dev2devmay > _______________________________________________ > Fedora-commons-users mailing list > Fedora-commons-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/fedora-commons-users > ------------------------------------------------------------------------------ What Every C/C++ and Fortran developer Should Know! Read this article and learn how Intel has extended the reach of its next-generation tools to help Windows* and Linux* C/C++ and Fortran developers boost performance applications - including clusters. http://p.sf.net/sfu/intel-dev2devmay _______________________________________________ Fedora-commons-users mailing list Fedora-commons-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-users