GavinRay97 commented on issue #12618: URL: https://github.com/apache/arrow/issues/12618#issuecomment-1070960313
Hey Micah, thanks for taking the time to leave a thorough set of comments. > Would the intention be to have a mapping for all Arrow data types from a java object? I think some of the existing getObject calls don't return the optimal types would the intention be to follow those mappings when possible? Take all of my answers to this and following questions with a grain of salt (I'm deeply unfamiliar with Arrow), but -- yes, where possible. I know that some Arrow types may not map well to JVM primitives, unsure what the best-case to do there is (maybe raw bytes?). But otherwise yes, whatever is the best-fit/most optimal Arrow -> JVM type mapping is the hope. I just don't know enough about Arrow to be a good judge of what that is at the moment. > I'm hesitant create a class named Dataframe in the project just for easy conversion back and forth between tuples. I think DataFrames come with a lot of expectations and in particular it seems like the canonical memory representation here seems to be row-based on-heap objects, I would expect an implementation to use a columnar representation (and at least use the concept of Vectors for columns even if VectorSchemaRoot isn't used). This is fair. I had originally implemented this in my own project as `Table` since it represents row-based/tabular data, but I thought that might be too confusing. Not sure what the best naming convention here is. But I do agree, it should be something that conveys that the data is non-columnar and there is a loss of efficiency. > I started a mailing list discussion on minimum Java version, but I believe we should be targetting at most JDK 11 for the time being. Also agreed, in this case I used `record` just for brevity's sake to avoid boilerplate in the code > for conversion from strings you need to pass UTF_ENCODING to avoid brittleness in conversion. Noted 👍 > I think trying to implement this in the pattern [Loader](https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/VectorLoader.html) and [Unloader](https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/VectorUnloader.html). Maybe a new interface like VectorRowLoader and VectorRowUnloader? If the goal is to interface well with flight I think this might be the most ergonomic. Your judgement is better than mine -- I might need a bit of guidance on how to do this/the overall approach though. > This probably belongs in a new contrib module, but I think this would lower the barrier for entry, so if you are willing to contribute something I'd be willing to help review. Sure, I think it'd be a valuable addition and I love to contribute to OSS in an impactful way when I can. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
