Title: RFE: 64 bit pointers needed Author: Justin Senseney Organization: National Institutes of Health Owner: Justin Senseney Created: 2012/04/17 Type: Feature State: Draft Exposure: Open Component: core/lang Scope: JDK JSR: TBD RFE: 4963452 (4850923, 4880587, 4088441, 6292967) Discussion: [email protected] Start: 2012/Q3 Depends: Blocks: Effort: XL Duration: L Template: 1.0 Internal-refs: Reviewed-by: Endorsed-by: Funded-by:
Summary ------- As per the Java Language Specification, section 10.4, all array access in Java is done by using an int as index. Since an int is a signed 32bit value, this limits the total number of addressable elements of an array to 2**31 (about 2 billion). It should be possible to address an array using 64bit values. Goals ----- Improved handling of large datasets that need to be stored in contiguous arrays. Non-Goals --------- Not changing existing range of Integer Success Metrics --------------- Able to compile boolean[] a = new boolean[Long.MAX_VALUE]; Motivation ---------- While having access to 2 billion entries may seem sufficient, there are very compelling performance reasons to be able to use more in a single array. As an example, consider a square n*n matrix, stored as an array (either row or column major, doesn't matter which). Since an array stores at most 2**31 entries, this means that n=sqrt(2**31)=46341, thus the matrix cannot be very large. For multidimensional arrays this is an even more severe limitation (3d Tensors could at most be of size 1290). Description ----------- The scope of this work is extensive, however the solution may be quite technically feasible. Alternatives ------------ A workaround is to use an array of arrays (ie. double[][]). However there is no guarantee that successive rows will be laid of linearly in memory, and therefore performance may be severely penalized. Experimentally, performance may suffer by a factor of over 2, often far greater. Also, most existing matrix packages (ie. LAPACK) assumes linear storage, and are thus incompatible with a double[][] storage (requires double[]). Calling a LAPACK routine with a jagged storage thus requires extra array copying and memory allocation, and can further decrease performance and increase memory requirements. Testing ------- It should be possible to address arrays using 64bit integers (long?), as this provides a seamless transition for users of 64bit computers. Risks and Assumptions --------------------- Use of array of array constructs (use double[][] instead of double[]) possible as workaround. This feature is well implemented in C/C++ without any problem, so should be quite technically feasible to implement. Dependences ----------- None none. Impact ------ My group has requested this feature for several years. It is currently listed as one of the top 25 RFEs on http://bugs.sun.com/top25_rfes.do. Please help Java maintain its relevance by implementing this. I have several image processing applications that are severely limited by this bug, these images cannot be opened in most Java applications. These include electron microscopy and micro-CT images where storage of a single slice requires more entries than allowable in a Java array. Thank you for considering this RFE, Justin Senseney BIRSS/ISL/DCB/CIT/NIH 301-594-5887 301-480-0028 (fax) Building 12A/2015 http://mipav.cit.nih.gov http://dcb.cit.nih.gov/~senseneyj http://image.nih.gov
