[ 
https://issues.apache.org/jira/browse/ARROW-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17309660#comment-17309660
 ] 

Bob Tinsman commented on ARROW-7494:
------------------------------------

I was looking for Java issues and this one showed up. I can see that there was 
a lot of work done on it by [~tianchen92] but the activity has petered out, so 
I can take it if no one objects.

I've reviewed the previous work (including the related ARROW-7539) and decided 
to start from scratch. I've been experimenting locally but will put my work 
into a PR soon so I can get people to review it.

I agree with [~jnadeau] that there is leftover stuff from an old 
implementation--ArrowBuf is overloaded. It's a tricky thing to factor out the 
r/w indices, but I think it can be done by looking at the usage patterns. 
Here's a rough summary (feel free to clarify or correct any mistakes):
 * The main pattern is to allocate ArrowBufs and slice them up into the buffers 
used by various vector implementations. This doesn't really require the indices.
 * Some code (the JsonFileReader for example) uses the various byte-twiddling 
methods (like writeByte(), writeShort(), etc.) that make use of the indices.
 * Some core buffer management code like BufferLedger sets the indices; I 
assume this is to reset them to known values before handing out ArrowBufs.

Possible approaches:
 * The original approach: rip out the indices and methods that use them, then 
change all the code using those methods to use equivalent logic
 ** It's a lot of changes, and difficult to get them all right.
 ** It could be a tough merge to other Arrow forks or other projects
 * Minimal approach that saves memory: put the r/w indices in a helper object 
that is allocated as needed; otherwise, no change to ArrowBuf methods
 ** Pro: save 12 bytes in ArrowBuf (4 byte object ref vs. 2 long indices)
 ** Pro: pretty safe, incremental refactoring that you could build on
 ** Con: haven't really cleaned up ArrowBuf
 * Factor out superclass (BaseArrowBuf) without indices, replace use of 
ArrowBuf with BaseArrowBuf where possible (I'm trying this one now)
 ** Pro: ArrowBuf doesn't change
 ** Con: The replacement of ArrowBuf touches a lot of code
 ** Con: Unclear how to change some of the core buffer management code like 
BufferLedger, which sets indices
 * Push indices and related methods into subclass (IndexedArrowBuf)
 ** Pro: clean up ArrowBuf, the original goal
 ** Pro: code that still needs old methods can work on a subclass instance
 ** Pro: core buffer logic still uses ArrowBuf
 ** Con: have to make sure semantics of setting the indices in core buffer 
logic are still the same

> [Java] Remove reader index and writer index from ArrowBuf
> ---------------------------------------------------------
>
>                 Key: ARROW-7494
>                 URL: https://issues.apache.org/jira/browse/ARROW-7494
>             Project: Apache Arrow
>          Issue Type: Task
>          Components: Java
>            Reporter: Jacques Nadeau
>            Assignee: Ji Liu
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Reader and writer index and functionality doesn't belong on a chunk of memory 
> and is due to inheritance from ByteBuf. As part of removing ByteBuf 
> inheritance, we should also remove reader and writer indexes from ArrowBuf 
> functionality. It wastes heap memory for rare utility. In general, a slice 
> can be used instead of a reader/writer index pattern.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to