GavinRay97 commented on issue #12618:
URL: https://github.com/apache/arrow/issues/12618#issuecomment-1070960313


   Hey Micah, thanks for taking the time to leave a thorough set of comments.
   
   > Would the intention be to have a mapping for all Arrow data types from a 
java object? I think some of the existing getObject calls don't return the 
optimal types would the intention be to follow those mappings when possible?
   
   Take all of my answers to this and following questions with a grain of salt 
(I'm deeply unfamiliar with Arrow), but -- yes, where possible.
   
   I know that some Arrow types may not map well to JVM primitives, unsure what 
the best-case to do there is (maybe raw bytes?). But otherwise yes, whatever is 
the best-fit/most optimal Arrow -> JVM type mapping is the hope. I just don't 
know enough about Arrow to be a good judge of what that is at the moment.
   
   > I'm hesitant create a class named Dataframe in the project just for easy 
conversion back and forth between tuples. I think DataFrames come with a lot of 
expectations and in particular it seems like the canonical memory 
representation here seems to be row-based on-heap objects, I would expect an 
implementation to use a columnar representation (and at least use the concept 
of Vectors for columns even if VectorSchemaRoot isn't used).
   
   This is fair. I had originally implemented this in my own project as `Table` 
since it represents row-based/tabular data, but I thought that might be too 
confusing. 
   
   Not sure what the best naming convention here is. But I do agree, it should 
be something that conveys that the data is non-columnar and there is a loss of 
efficiency.
   
   > I started a mailing list discussion on minimum Java version, but I believe 
we should be targetting at most JDK 11 for the time being.
   
   Also agreed, in this case I used `record` just for brevity's sake to avoid 
boilerplate in the code
   
   > for conversion from strings you need to pass UTF_ENCODING to avoid 
brittleness in conversion.
   
   Noted 👍 
   
   > I think trying to implement this in the pattern 
[Loader](https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/VectorLoader.html)
 and 
[Unloader](https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/VectorUnloader.html).
 Maybe a new interface like VectorRowLoader and VectorRowUnloader? If the goal 
is to interface well with flight I think this might be the most ergonomic.
   
   Your judgement is better than mine -- I might need a bit of guidance on how 
to do this/the overall approach though.
   
   > This probably belongs in a new contrib module, but I think this would 
lower the barrier for entry, so if you are willing to contribute something I'd 
be willing to help review.
   
   Sure, I think it'd be a valuable addition and I love to contribute to OSS in 
an impactful way when I can.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to