Thanks for looking at that Scott.
 
It sounds like we need an alternative such as
http://commons.apache.org/codec/apidocs/org/apache/commons/codec/binary/Base
64InputStream.html which does streaming encoding (unlimited size).
 
Steve

-----Original Message-----
From: Scott Hammel [mailto:sc...@clemson.edu] 
Sent: 18 May 2011 21:43
To: Support and info exchange list for Fedora users.
Subject: Re: [fcrepo-user] REST export API negative array index exception


One last thing then I'll quit gabbing so this list doesn't stay as busy as
the Solr user's list :-) just to summarize:

I poked into the source for Apache Commons base64 codec 1.3 at the line
indicated in my error logs: it's a line in encodeBase64() where a byte array
is allocated for storage. To compute the array size to allocate, the method
multiplies the size of the incoming binary data array by 8. So 300 MB => 300
* 2^20 * 8 which is > max int, I do believe.

Next limit is the practical limit of JVM RAM on a 32-bit server: really
slightly less than 2 GB.

Next limit is the fact that it looks like a ByteArrayOutputStream uses a
byte array as a buffer, and the limit is max int again, but this time for #
of bytes in the datastream (appx 2GB).

Scott

On 05/18/2011 03:53 PM, Scott Hammel wrote: 

I see where you are going :-)



I just ran a 400MB test with an ATOMZip export. Seems to have worked 

just fine.



A 900MB datastream export to ATOMZip test failed. No exception generated 

in the logs, just an internal server error. I noticed with 3.4.2 this 

can indicate the JVM ran out memory (not surprising if the export is 

still being collected into a ByteArrayOutputStream, I guess).



Scott



On 05/18/2011 11:53 AM, Stephen Bayliss wrote:

Hi Scott



Thanks for that feedback.



It would be interesting to find out if you get the same problem using the

AtomZip export format (info:fedora/fedora-system:ATOMZip-1.1)



Steve



-----Original Message-----

From: Scott Hammel [mailto:sc...@clemson.edu]

Sent: 18 May 2011 16:16

To: Support and info exchange list for Fedora users.

Subject: Re: [fcrepo-user] REST export API negative array

index exception





Scott, Steve,



REST export in archive format still blows up with Fedora

3.4.2. Actually

is crashing on a datastream<  300MB. I gave the JVM 1.5GB of

heap, BTW.



Regardless, the exception that is in fedora.log is a negative array

index exception. It looks like it is actually occurring down in the

base64 encoder according to the stack trace.



It occurs to me that building support for a full archival

export of an

object in memory for arbitrarily large objects might be pragmatically

(is that a word?) impossible: e.g., on 32-bit systems I think

you bump

into problems giving the JVM more than ~1.8 GB of RAM. That

alone limits

the size of exportable objects to well under 2GB in that environment.



If I was more adept with Java, I'd volunteer to write an

exporter that

spooled to disk, but alas, I am not and it would take me

twice as long

as someone who is. :-(



I can take one of several alternative paths with my

particular project,

so it isn't too big an issue to *me* .... I just have to do a little

more coding in a middle-tier. Don't know about other folks, of course.



-Scott



On 05/18/2011 01:08 AM, Stephen Bayliss wrote:

Looking at those lines of code it looks like in theory

there would be

a problem there.  Once this is confirmed we should probably

add a test

case to the large datastreams test suite.  And it is likely

to cause a

problem with datastreams smaller than 2GB (2^31-1 as maximum array

index) due to the archive export base64-encoding the content.



-----Original Message-----

From: Scott Prater [mailto:pra...@wisc.edu]

Sent: 17 May 2011 18:33

To: Support and info exchange list for Fedora users.

Subject: Re: [fcrepo-user] REST export API negative array index

exception





Yes, trying with the latest stable version (3.4.2) would

be useful,

if you don't mind.  There were some lowlevel garbage collection

problems that were fixed in the 3.4.2 release;  these problems

manifested themselves in a variety of ways.



I'm not saying this is the issue, but it wouldn't hurt to

verify that

your problem can be reproduced in 3.4.2.



thanks,



-- Scott



On 05/17/2011 12:22 PM, Scott Hammel wrote:

I'm pretty sure it is 3.4.0 (from files on the server it

looks like an

August 2010 build. The server is in a totally isolated

network with

nothing with GUI support that can hit the admin tools).



Tomcat is the version bundled with the Fedora installer.



Would you like me to be sure I'm running at the latest

version and try

the test scripts again before you go forward?



Scott



On 05/17/2011 12:45 PM, Scott Prater wrote:

Thanks, Scott.  I'll try to reproduce the problem in my

environment,

Fedora 3.4.2.



Can you tell me what version of Fedora and Tomcat (or

other webapp

server) you're using?



-- Scott



On 05/17/2011 11:08 AM, Scott Hammel wrote:

Hey, Scott,



Thanks for responding. I'm more a C/C++ programmer and

not a Java

programmer (though I sometimes play one on the

Internet), so I'm

just guessing on the array bounds -- feels like something

incrementing an int into the sign bit, though I'd think

Java would

throw some array bounds exception before that happened.

Figured I'd

do a little math later maybe to test my hypothesis.



Recall, this was all in a 32-bit environment. I really

hope it is a

non-issue and something I'm doing in the end. Note

disseminating the

datastream content directly appears to work OK, which

confuses me a

little, though I haven't looked to see if the code for

that does

things differently.



Anyway, here's a series of commands (extracted from my

test scripts)

that should reproduce the problem:



mkdir /usr/fedora/tomcat/webapps/ROOT/ingestpool

mkdir /tmp/fedrun

dir=/tmp/fedrun

pid=test:pid01



dd if=/dev/urandom

of=/usr/fedora/tomcat/webapps/ROOT/ingestpool/sample.bin bs=1M

count=400



./makefoxml $pid http://localhost:8080/ingestpool/sample.bin>

$dir/sample.xml



/usr/fedora/client/bin/fedora-ingest.sh f $dir/sample.xml

info:fedora/fedora-system:FOXML-1.1 localhost:8080

fedoraAdmin<insert

pwd here>      http



wget -O $dir/export.xml --auth-no-challenge

--http-user=fedoraAdmin

--http-password=<insert pwd here>

http://localhost:8080/fedora/objects/$pid/export?context=archive



Note: I use the REST call via a wget rather than the

provided export

client scripts because it looks to me from the Java heap

explosion

that the export scripts must end up doing the export

via the SOAP

API.

--

The content of makefoxml:



#!/bin/bash



#usage: makefoxml<pid>      <refurl>

#escape slashes off the URL

RF=${2//\//\\/}

#if you need to escape ampersands as well, uncomment this:

#RF=${RF//'&'/'\&'}



# make substitutions ....

sed '

s/PID=""/PID="'"$1"'"/

s/rdf:about=""/rdf:about="info:fedora\/'" <info:fedora\/'> $1"'"/

s/dc:identifier>/dc:identifier>'"$1"'/

s/REF=""/REF="'"${RF}"'"/

'<      "foxml_tpl.xml"



--

The content of foxml_tmp.xml (the sed script above does

the edits

noted in the xml comments in this template):



<?xml version="1.0" encoding="UTF-8"?>

<!-- following element: set the PID attribute -->

<foxml:digitalObject VERSION="1.1" PID=""

xmlns:foxml="info:fedora/fedora-system:def/foxml#"
<info:fedora/fedora-system:def/foxml#> 

        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
<http://www.w3.org/2001/XMLSchema-instance> 

xsi:schemaLocation="info:fedora/fedora-system:def/foxml#
<info:fedora/fedora-system:def/foxml#http://www.fedora.info/definitions/1/0/
foxml1-1.xsd> 

http://www.fedora.info/definitions/1/0/foxml1-1.xsd";>



<foxml:objectProperties>

<foxml:property NAME="info:fedora/fedora-system:def/model#state"
<info:fedora/fedora-system:def/model#state> 

VALUE="A"/>   <foxml:property

NAME="info:fedora/fedora-system:def/model#label"
<info:fedora/fedora-system:def/model#label>  VALUE=""/>

<foxml:property

NAME="info:fedora/fedora-system:def/model#ownerId"
<info:fedora/fedora-system:def/model#ownerId> 

VALUE="fedoraAdmin"/>

</foxml:objectProperties>



<foxml:datastream CONTROL_GROUP="X" ID="RELS-EXT">

<foxml:datastreamVersion

FORMAT_URI="info:fedora/fedora-system:FedoraRELSExt-1.0"
<info:fedora/fedora-system:FedoraRELSExt-1.0> 

            ID="RELS-EXT.0" LABEL="RDF Statements about

this Object"

MIMETYPE="application/rdf+xml">   <foxml:xmlContent>  <rdf:RDF

xmlns:dc="http://purl.org/dc/elements/1.1/";
<http://purl.org/dc/elements/1.1/> 



xmlns:fedora="info:fedora/fedora-system:def/relations-external#"
<info:fedora/fedora-system:def/relations-external#> 

xmlns:fedora-model="info:fedora/fedora-system:def/model#"
<info:fedora/fedora-system:def/model#> 

xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/";
<http://www.openarchives.org/OAI/2.0/oai_dc/> 

xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
<http://www.w3.org/1999/02/22-rdf-syntax-ns#> 

xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#";
<http://www.w3.org/2000/01/rdf-schema#> >

<!-- following element: put the PID as the value for

the rdf:about

attribute -->  <rdf:description rdf:about="">

</rdf:description>

</rdf:RDF>

</foxml:xmlContent>

</foxml:datastreamVersion>

</foxml:datastream>



<foxml:datastream CONTROL_GROUP="X" ID="DC" STATE="A"

VERSIONABLE="true">   <foxml:datastreamVersion ID="DC.0"

LABEL="Dublin

Core Record" MIMETYPE="text/xml">   <foxml:xmlContent>

<oai_dc:dc

xmlns:dc="http://purl.org/dc/elements/1.1/";
<http://purl.org/dc/elements/1.1/> 



xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/";
<http://www.openarchives.org/OAI/2.0/oai_dc/> 

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
<http://www.w3.org/2001/XMLSchema-instance> 

xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/
<http://www.openarchives.org/OAI/2.0/oai_dc/http://www.openarchives.org/OAI/
2.0/oai_dc.xsd> 

http://www.openarchives.org/OAI/2.0/oai_dc.xsd";>

<dc:title></dc:title>

<dc:creator>Test Program</dc:creator>

<dc:description>A test object</dc:description>

<!-- following element: put the PID between the tags -->

<dc:identifier></dc:identifier>  </oai_dc:dc>

</foxml:xmlContent>

</foxml:datastreamVersion>

</foxml:datastream>



<foxml:datastream CONTROL_GROUP="M" ID="Content" STATE="A">

<foxml:datastreamVersion ID="Content.0" LABEL="This is

the object

content" MIMETYPE="    application/octet-stream">

<!-- following element: put the URL to the content file

as the value

for the REF attribute -->

<!-- must be an http URL, e.g.,

http://localhost:8080/ingestpool/foxmldoc.xml -->

<!-- I just create a directory "ingestpool" under

/usr/fedora/tomcat/webapps/ROOT and put the files there -->

<foxml:contentLocation REF="" TYPE="URL" />

</foxml:datastreamVersion>   </foxml:datastream>





</foxml:digitalObject>







On 05/17/2011 10:00 AM, Scott Prater wrote:

Scott,



Can you come up with a test case that confirms this

limitation?  If

you can provide one, I'll open up a JIRA ticket for the issue.



thanks,



-- Scott



On 05/16/2011 10:45 AM, Scott Hammel wrote:

Oh, I think I see: last line of the serializer's serialize

function does

this:

bytes.toByteArray()

where bytes is a ByteArrayOutputStream



I *think* the max size of an array index in Java (32-bit) is

2,147,483,647 (i.e., 2^31 - 1, max value of a java

int). So, this

function will throw an exception if a datastream

"archive" export

is>   ~2 GB.



scott



On 05/16/2011 11:00 AM, Scott Hammel wrote:

Hi,



Running some export tests using Fedora's REST export

API, I get a

negative array index Java exception when doing an

"archive" export of an

object at around 400 MB (>320 MB,<         450 MB).



Fedora is version 3.4 something; running on 32-bit

CentOS 5.5,

Sun Java 1.6, 21



Is it just me or has anyone else seen something like that?



Thanks,

Scott





-----------------------------------------------------------------

-------------

Achieve unprecedented app performance and reliability What

every C/C++ and Fortran developer should know. Learn

how Intel

has extended the reach of its

next-generation tools

to help boost performance applications - inlcuding clusters.

http://p.sf.net/sfu/intel-dev2devmay

_______________________________________________

Fedora-commons-users mailing list

Fedora-commons-users@lists.sourceforge.net



https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

------------------------------------------------------------------

------------

Achieve unprecedented app performance and reliability

What every

C/C++ and Fortran developer should know. Learn how Intel has

extended the reach of its

next-generation tools

to help boost performance applications - inlcuding clusters.

http://p.sf.net/sfu/intel-dev2devmay

_______________________________________________

Fedora-commons-users mailing list

Fedora-commons-users@lists.sourceforge.net



https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

--------------------------------------------------------------------

----------

Achieve unprecedented app performance and reliability

What every

C/C++ and Fortran developer should know. Learn how Intel has

extended the reach of its

next-generation tools

to help boost performance applications - inlcuding clusters.

http://p.sf.net/sfu/intel-dev2devmay

_______________________________________________

Fedora-commons-users mailing list

Fedora-commons-users@lists.sourceforge.net



https://lists.sourceforge.net/lists/listinfo/fedora-commons-users



---------------------------------------------------------------------

-

--------

Achieve unprecedented app performance and reliability

What every C/C++ and Fortran developer should know.

Learn how Intel has extended the reach of its

next-generation tools

to help boost performance applications - inlcuding clusters.

http://p.sf.net/sfu/intel-dev2devmay

_______________________________________________

Fedora-commons-users mailing list

Fedora-commons-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

--

Scott Prater

Library, Instructional, and Research Applications (LIRA)

Division of Information Technology (DoIT) University of

Wisconsin - Madison pra...@wisc.edu



--------------------------------------------------------------

----------------

Achieve unprecedented app performance and reliability

What every C/C++ and Fortran developer should know.

Learn how Intel has extended the reach of its

next-generation tools

to help boost performance applications - inlcuding clusters.

http://p.sf.net/sfu/intel-dev2devmay

_______________________________________________

Fedora-commons-users mailing list

Fedora-commons-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/fedora-commons-users





----------------------------------------------------------------------

--------

What Every C/C++ and Fortran developer Should Know!

Read this article and learn how Intel has extended the reach of its

next-generation tools to help Windows* and Linux* C/C++ and Fortran

developers boost performance applications - including clusters.

http://p.sf.net/sfu/intel-dev2devmay

_______________________________________________

Fedora-commons-users mailing list

Fedora-commons-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/fedora-commons-users



--------------------------------------------------------------

----------------

What Every C/C++ and Fortran developer Should Know!

Read this article and learn how Intel has extended the reach of its

next-generation tools to help Windows* and Linux* C/C++ and Fortran

developers boost performance applications - including clusters.

http://p.sf.net/sfu/intel-dev2devmay

_______________________________________________

Fedora-commons-users mailing list

Fedora-commons-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/fedora-commons-users



----------------------------------------------------------------------------
--

What Every C/C++ and Fortran developer Should Know!

Read this article and learn how Intel has extended the reach of its

next-generation tools to help Windows* and Linux* C/C++ and Fortran

developers boost performance applications - including clusters.

http://p.sf.net/sfu/intel-dev2devmay

_______________________________________________

Fedora-commons-users mailing list

Fedora-commons-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/fedora-commons-users





----------------------------------------------------------------------------
--

What Every C/C++ and Fortran developer Should Know!

Read this article and learn how Intel has extended the reach of its 

next-generation tools to help Windows* and Linux* C/C++ and Fortran 

developers boost performance applications - including clusters. 

http://p.sf.net/sfu/intel-dev2devmay

_______________________________________________

Fedora-commons-users mailing list

Fedora-commons-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/fedora-commons-users




------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its 
next-generation tools to help Windows* and Linux* C/C++ and Fortran 
developers boost performance applications - including clusters. 
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to