[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214901#comment-16214901
 ] 

ASF GitHub Bot commented on ARROW-1474:
---------------------------------------

siddharthteotia commented on issue #1203: ARROW-1474:[WIP] Java Vector Refactor 
(Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#issuecomment-338598989
 
 
   **W.r.t introduction of some static interfaces for JsonFileWriter/Reader**
   
   The introduction of couple of static interfaces is not absolutely necessary. 
They are written for better readability in JsonFileReader's gigantic switch 
block when it parses Json and writes to the vector (and its inner vectors). 
Since now we no longer have inner vectors, we obviously couldn't leverage the 
same code. The JsonFileReader had to be changed to specifically write to 
different buffers (TYPE, VALIDITY, OFFSET, DATA) for a particular vector. Also 
it has to allocate the buffer and appropriately set writer index before calling 
loadFieldBuffers. This is something that was needed for every case in switch 
block here. Once I did this, the code looked pretty messy and ugly. So I moved 
all the logic private to vectors and made them as part of static interfaces.
   
   On the other hand, in JsonFileWriter we were reading from vector (and its 
inner vectors) and writing out Json data. Again, since there are no inner 
vectors, all operation had to be transformed to work at the buffer level -- for 
writing the contents of each inner buffer. Also, the old code of JsonFileWriter 
stated a TODO that it was not handling each type. The new code handles all 
types.
   
   If the general preference is to not introduce static interfaces in vector 
APIs, I am fine with removing them and moving all logic into JSon code itself. 
The javadocs already indicate that external use of these APIs is not 
recommended.
   
   **W.r.t introduction of some new APIs in ValueVector**
   
   Note that top level interface is still ValueVector even though hierarchy 
underneath has changed. So there are still non-nullable vectors extending 
ValueVector, implementing mutator/accessor interfaces etc.
   
   So I introduced APIs like getNullCount(), getValueCount(), setValueCount(), 
getObject() for the new nullable vectors. Once we remove non-nullable vectors 
and expose mutator/accessor functions as direct get/set in ValueVector, we can 
get rid of these APIs too. 
   
   User is free to call such methods on vectors since internally they delegate 
the call to corresponding mutator/accessor operation for non-nullable vectors 
and for nullable vectors we already have the new implementation. For legacy 
vectors, it doesn't really matter since each operation is just a pass-through 
to new code.
   
   There aren't any placeholder interfaces anywhere. Each vector (nullable or 
non-nullable) has a concrete implementation of all such interfaces as 
prescribed by ValueVector. Correctness is not affected anywhere. We should be 
able to do the simple cleanup once we remove non-nullable vectors. If we are 
not planning to remove non-nullable vectors then we should just remove 
mutator/accessor from them and expose all the get/set APIs directly just like 
we have done for other nullable and complex vectors. That will also allow us to 
do simple cleanup.
   
   Whatever we decide to do with non-nullable scalar vectors, we should do soon 
to make the entire java Vector code under ValueVector hierarchy consistent.
   
   Right now the nullable scalars and complex types are consistent -- none of 
them have inner vectors and none of them support mutator/accessor based access. 
Either we should do the same thing with non nullable vectors or remove them all 
together. The latter is preferable.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -----------------------------------------------------
>
>                 Key: ARROW-1474
>                 URL: https://issues.apache.org/jira/browse/ARROW-1474
>             Project: Apache Arrow
>          Issue Type: Sub-task
>            Reporter: Jacques Nadeau
>            Assignee: Siddharth Teotia
>              Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to