Title: RFE: 64 bit pointers needed
Author: Justin Senseney
Organization: National Institutes of Health
Owner: Justin Senseney
Created: 2012/04/17
Type: Feature
State: Draft
Exposure: Open
Component: core/lang
Scope: JDK
JSR: TBD
RFE: 4963452 (4850923, 4880587, 4088441, 6292967)
Discussion: [email protected]
Start: 2012/Q3
Depends:
Blocks:
Effort: XL
Duration: L
Template: 1.0
Internal-refs:
Reviewed-by:
Endorsed-by:
Funded-by:

Summary
-------

As per the Java Language Specification, section 10.4, all array access in Java 
is done by using an int as index. Since an int is a signed 32bit value, this 
limits the total number of addressable elements of an array to 2**31 (about 2 
billion). It should be possible to address an array using 64bit values.

Goals
-----

Improved handling of large datasets that need to be stored in contiguous arrays.

Non-Goals
---------

Not changing existing range of Integer

Success Metrics
---------------

Able to compile boolean[] a = new boolean[Long.MAX_VALUE];

Motivation
----------

While having access to 2 billion entries may seem sufficient, there are very 
compelling performance reasons to be able to use more in a single array. As an 
example, consider a square n*n matrix, stored as an array (either row or column 
major, doesn't matter which). Since an array stores at most 2**31 entries, this 
means that n=sqrt(2**31)=46341, thus the matrix cannot be very large. For 
multidimensional arrays this is an even more severe limitation (3d Tensors 
could at most be of size 1290).

Description
-----------

The scope of this work is extensive, however the solution may be quite 
technically feasible.

Alternatives
------------

A workaround is to use an array of arrays (ie. double[][]). However there is no 
guarantee that successive rows will be laid of linearly in memory, and 
therefore performance may be severely penalized. Experimentally, performance 
may suffer by a factor of over 2, often far greater.

Also, most existing matrix packages (ie. LAPACK) assumes linear storage, and 
are thus incompatible with a double[][] storage (requires double[]). Calling a 
LAPACK routine with a jagged storage thus requires extra array copying and 
memory allocation, and can further decrease performance and increase memory 
requirements.


Testing
-------

It should be possible to address arrays using 64bit integers (long?), as this 
provides a seamless transition for users of 64bit computers.

Risks and Assumptions
---------------------

Use of array of array constructs (use double[][] instead of double[]) possible 
as workaround. This feature is well implemented in C/C++ without any problem, 
so should be quite technically feasible to implement.

Dependences
-----------

None none.

Impact
------

My group has requested this feature for several years.  It is currently listed 
as one of the top 25 RFEs on http://bugs.sun.com/top25_rfes.do.  Please help 
Java maintain its relevance by implementing this.   I have several image 
processing applications that are severely limited by this bug, these images 
cannot be opened in most Java applications.  These include electron microscopy 
and micro-CT images where storage of a single slice requires more entries than 
allowable in a Java array.



Thank you for considering this RFE,
Justin Senseney
BIRSS/ISL/DCB/CIT/NIH
301-594-5887
301-480-0028 (fax)
Building 12A/2015

http://mipav.cit.nih.gov
http://dcb.cit.nih.gov/~senseneyj
http://image.nih.gov

Reply via email to