Hey, Scott,

Thanks for responding. I'm more a C/C++ programmer and not a Java 
programmer (though I sometimes play one on the Internet), so I'm just 
guessing on the array bounds -- feels like something incrementing an int 
into the sign bit, though I'd think Java would throw some array bounds 
exception before that happened. Figured I'd do a little math later maybe 
to test my hypothesis.

Recall, this was all in a 32-bit environment. I really hope it is a 
non-issue and something I'm doing in the end. Note disseminating the 
datastream content directly appears to work OK, which confuses me a 
little, though I haven't looked to see if the code for that does things 
differently.

Anyway, here's a series of commands (extracted from my test scripts) 
that should reproduce the problem:

mkdir /usr/fedora/tomcat/webapps/ROOT/ingestpool
mkdir /tmp/fedrun
dir=/tmp/fedrun
pid=test:pid01

dd if=/dev/urandom 
of=/usr/fedora/tomcat/webapps/ROOT/ingestpool/sample.bin bs=1M count=400

./makefoxml $pid http://localhost:8080/ingestpool/sample.bin > 
$dir/sample.xml

/usr/fedora/client/bin/fedora-ingest.sh f $dir/sample.xml 
info:fedora/fedora-system:FOXML-1.1 localhost:8080 fedoraAdmin <insert 
pwd here> http

wget -O $dir/export.xml --auth-no-challenge --http-user=fedoraAdmin 
--http-password=<insert pwd here> 
http://localhost:8080/fedora/objects/$pid/export?context=archive

Note: I use the REST call via a wget rather than the provided export 
client scripts because it looks to me from the Java heap explosion that 
the export scripts must end up doing the export via the SOAP API.
--
The content of makefoxml:

#!/bin/bash

#usage: makefoxml <pid> <refurl>
#escape slashes off the URL
RF=${2//\//\\/}
#if you need to escape ampersands as well, uncomment this:
#RF=${RF//'&'/'\&'}

# make substitutions ....
sed '
s/PID=""/PID="'"$1"'"/
s/rdf:about=""/rdf:about="info:fedora\/'"$1"'"/
s/dc:identifier>/dc:identifier>'"$1"'/
s/REF=""/REF="'"${RF}"'"/
' < "foxml_tpl.xml"

--
The content of foxml_tmp.xml (the sed script above does the edits noted 
in the xml comments in this template):

<?xml version="1.0" encoding="UTF-8"?>
<!-- following element: set the PID attribute -->
<foxml:digitalObject VERSION="1.1" PID="" 
xmlns:foxml="info:fedora/fedora-system:def/foxml#"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:schemaLocation="info:fedora/fedora-system:def/foxml# 
http://www.fedora.info/definitions/1/0/foxml1-1.xsd";>

<foxml:objectProperties>
<foxml:property NAME="info:fedora/fedora-system:def/model#state" VALUE="A"/>
<foxml:property NAME="info:fedora/fedora-system:def/model#label" VALUE=""/>
<foxml:property NAME="info:fedora/fedora-system:def/model#ownerId" 
VALUE="fedoraAdmin"/>
</foxml:objectProperties>

<foxml:datastream CONTROL_GROUP="X" ID="RELS-EXT">
<foxml:datastreamVersion 
FORMAT_URI="info:fedora/fedora-system:FedoraRELSExt-1.0"
       ID="RELS-EXT.0" LABEL="RDF Statements about this Object" 
MIMETYPE="application/rdf+xml">
<foxml:xmlContent>
<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/";
           xmlns:fedora="info:fedora/fedora-system:def/relations-external#"
           xmlns:fedora-model="info:fedora/fedora-system:def/model#"
           xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/";
           xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"; 
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#";>
<!-- following element: put the PID as the value for the rdf:about 
attribute -->
<rdf:description rdf:about="">
</rdf:description>
</rdf:RDF>
</foxml:xmlContent>
</foxml:datastreamVersion>
</foxml:datastream>

<foxml:datastream CONTROL_GROUP="X" ID="DC" STATE="A" VERSIONABLE="true">
<foxml:datastreamVersion ID="DC.0" LABEL="Dublin Core Record" 
MIMETYPE="text/xml">
<foxml:xmlContent>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/";
           xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/";
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ 
http://www.openarchives.org/OAI/2.0/oai_dc.xsd";>
<dc:title></dc:title>
<dc:creator>Test Program</dc:creator>
<dc:description>A test object</dc:description>
<!-- following element: put the PID between the tags -->
<dc:identifier></dc:identifier>
</oai_dc:dc>
</foxml:xmlContent>
</foxml:datastreamVersion>
</foxml:datastream>

<foxml:datastream CONTROL_GROUP="M" ID="Content" STATE="A">
<foxml:datastreamVersion ID="Content.0" LABEL="This is the object 
content" MIMETYPE="    application/octet-stream">
<!-- following element: put the URL to the content file as the value for 
the REF attribute -->
<!-- must be an http URL, e.g., 
http://localhost:8080/ingestpool/foxmldoc.xml -->
<!-- I just create a directory "ingestpool" under 
/usr/fedora/tomcat/webapps/ROOT and put the files there -->
<foxml:contentLocation REF="" TYPE="URL" />
</foxml:datastreamVersion>
</foxml:datastream>


</foxml:digitalObject>



On 05/17/2011 10:00 AM, Scott Prater wrote:
> Scott,
>
> Can you come up with a test case that confirms this limitation?  If you
> can provide one, I'll open up a JIRA ticket for the issue.
>
> thanks,
>
> -- Scott
>
> On 05/16/2011 10:45 AM, Scott Hammel wrote:
>> Oh, I think I see: last line of the serializer's serialize function does
>> this:
>> bytes.toByteArray()
>> where bytes is a ByteArrayOutputStream
>>
>> I *think* the max size of an array index in Java (32-bit) is
>> 2,147,483,647 (i.e., 2^31 - 1, max value of a java int). So, this
>> function will throw an exception if a datastream "archive" export is>
>> ~2 GB.
>>
>> scott
>>
>> On 05/16/2011 11:00 AM, Scott Hammel wrote:
>>> Hi,
>>>
>>> Running some export tests using Fedora's REST export API, I get a
>>> negative array index Java exception when doing an "archive" export of an
>>> object at around 400 MB (>320 MB,<    450 MB).
>>>
>>> Fedora is version 3.4 something; running on 32-bit CentOS 5.5, Sun Java
>>> 1.6, 21
>>>
>>> Is it just me or has anyone else seen something like that?
>>>
>>> Thanks,
>>> Scott
>>>
>>> ------------------------------------------------------------------------------
>>> Achieve unprecedented app performance and reliability
>>> What every C/C++ and Fortran developer should know.
>>> Learn how Intel has extended the reach of its next-generation tools
>>> to help boost performance applications - inlcuding clusters.
>>> http://p.sf.net/sfu/intel-dev2devmay
>>> _______________________________________________
>>> Fedora-commons-users mailing list
>>> Fedora-commons-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>>>
>>
>> ------------------------------------------------------------------------------
>> Achieve unprecedented app performance and reliability
>> What every C/C++ and Fortran developer should know.
>> Learn how Intel has extended the reach of its next-generation tools
>> to help boost performance applications - inlcuding clusters.
>> http://p.sf.net/sfu/intel-dev2devmay
>> _______________________________________________
>> Fedora-commons-users mailing list
>> Fedora-commons-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>


------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to