[
https://issues.apache.org/jira/browse/ARROW-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17309660#comment-17309660
]
Bob Tinsman commented on ARROW-7494:
------------------------------------
I was looking for Java issues and this one showed up. I can see that there was
a lot of work done on it by [~tianchen92] but the activity has petered out, so
I can take it if no one objects.
I've reviewed the previous work (including the related ARROW-7539) and decided
to start from scratch. I've been experimenting locally but will put my work
into a PR soon so I can get people to review it.
I agree with [~jnadeau] that there is leftover stuff from an old
implementation--ArrowBuf is overloaded. It's a tricky thing to factor out the
r/w indices, but I think it can be done by looking at the usage patterns.
Here's a rough summary (feel free to clarify or correct any mistakes):
* The main pattern is to allocate ArrowBufs and slice them up into the buffers
used by various vector implementations. This doesn't really require the indices.
* Some code (the JsonFileReader for example) uses the various byte-twiddling
methods (like writeByte(), writeShort(), etc.) that make use of the indices.
* Some core buffer management code like BufferLedger sets the indices; I
assume this is to reset them to known values before handing out ArrowBufs.
Possible approaches:
* The original approach: rip out the indices and methods that use them, then
change all the code using those methods to use equivalent logic
** It's a lot of changes, and difficult to get them all right.
** It could be a tough merge to other Arrow forks or other projects
* Minimal approach that saves memory: put the r/w indices in a helper object
that is allocated as needed; otherwise, no change to ArrowBuf methods
** Pro: save 12 bytes in ArrowBuf (4 byte object ref vs. 2 long indices)
** Pro: pretty safe, incremental refactoring that you could build on
** Con: haven't really cleaned up ArrowBuf
* Factor out superclass (BaseArrowBuf) without indices, replace use of
ArrowBuf with BaseArrowBuf where possible (I'm trying this one now)
** Pro: ArrowBuf doesn't change
** Con: The replacement of ArrowBuf touches a lot of code
** Con: Unclear how to change some of the core buffer management code like
BufferLedger, which sets indices
* Push indices and related methods into subclass (IndexedArrowBuf)
** Pro: clean up ArrowBuf, the original goal
** Pro: code that still needs old methods can work on a subclass instance
** Pro: core buffer logic still uses ArrowBuf
** Con: have to make sure semantics of setting the indices in core buffer
logic are still the same
> [Java] Remove reader index and writer index from ArrowBuf
> ---------------------------------------------------------
>
> Key: ARROW-7494
> URL: https://issues.apache.org/jira/browse/ARROW-7494
> Project: Apache Arrow
> Issue Type: Task
> Components: Java
> Reporter: Jacques Nadeau
> Assignee: Ji Liu
> Priority: Critical
> Labels: pull-request-available
> Fix For: 4.0.0
>
> Time Spent: 2h 20m
> Remaining Estimate: 0h
>
> Reader and writer index and functionality doesn't belong on a chunk of memory
> and is due to inheritance from ByteBuf. As part of removing ByteBuf
> inheritance, we should also remove reader and writer indexes from ArrowBuf
> functionality. It wastes heap memory for rare utility. In general, a slice
> can be used instead of a reader/writer index pattern.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)