[GitHub] [arrow] lidavidm commented on issue #14039: How to get Arrow Schema for PostgreSQL column of hstore(map) and geometry types in JdbcToArrowUtils.jdbcToArrowSchema ?

GitBox Tue, 06 Sep 2022 06:17:31 -0700


lidavidm commented on issue #14039:
URL: https://github.com/apache/arrow/issues/14039#issuecomment-1238138823


   > Hello @lidavidm Thank you for your time and knowledge sharing!
   > 
   > > For Map: it'd be reasonable to add support directory
   > 
   > Sounds perfect! Something similar [with 
following](https://github.com/apache/arrow/blob/3e40cd3648a5b4f6ee7203a6c408f08c0abe4696/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java#L274)
 snippet for schema: ` if(arrowType.getTypeID() == ArrowType.ArrowTypeID.Map) { 
FieldType mapType = new FieldType(false, ArrowType.Struct.INSTANCE, null, 
null); FieldType keyType1 = new FieldType(false, new ArrowType.Utf8(), null, 
null); children = new ArrayList<>(); children.add(new Field("child", mapType, 
Arrays.asList(new Field(MapVector.KEY_NAME, keyType1, null), new 
Field(MapVector.VALUE_NAME, keyType1, null)))); }`
   
   Yes - if you want to file a Jira and/or PR they would be much appreciated. 
(For anything here, I'm generally supportive, but I don't have too much time 
right now to do more than review code.)
   
   > 
   > > For geometry type: the question would be, what type do you expect this 
to be mapped to?
   > 
   > binary representation is OK for me for current task
   > 
   
   That sounds good to me. If there's more capabilities required, you may want 
to follow the discussion about "canonical extension types": 
https://lists.apache.org/thread/qxc1g7h9ow79qt6r7sqtgbj8mdbdgnhb 
   
   This would let us define a new "type" for Postgres-style geometry data which 
is just metadata over an existing type (such as binary).
   
   > > And would it be useful to extend the JDBC adapter with full control over 
the Arrow type support, so that you can add custom types at runtime?
   > 
   > It could be good solution for custom type mapping in complex cases. First 
step is [add column 
index](https://github.com/apache/arrow/blob/3e40cd3648a5b4f6ee7203a6c408f08c0abe4696/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java#L260)
 in `final ArrowType arrowType = 
config.getJdbcToArrowTypeConverter().apply(columnIndex, columnFieldInfo); ` It 
is allow write custom type mapping by end users (Leveraging jdbc driver 
specific features and types. The same story with arrays elements types - much 
more easy to get element type by column index then iterate all columns to 
collect data for jdbcToArrowConfig.setArraySubTypeByColumnIndexMap()
   
   Ok, cool. If you have an API proposal, feel free to file a Jira and/or PR - 
I think it makes sense to allow more flexibility here, given that we allow it 
in the 'opposite direction' anyways.
   
   > 
   > I also have proposal for column/schema level metadata - propagate 
"comment" for table/columns 
[here](https://github.com/apache/arrow/blob/3e40cd3648a5b4f6ee7203a6c408f08c0abe4696/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java#L249)
 by using java.sql.DatabaseMetaData#getColumns / 
java.sql.DatabaseMetaData#getTables from connection.getMetaData() or just allow 
user to provide it from custom comment handler in JdbcToArrowConfig . It will 
be very useful metadata in real life (medium to large scale project) for 
documentation and maintenance topics. Apache Spark code use "comment" key for 
such metadata, so this looks like reasonable default name for metadata in Arrow 
schema too
   
   Interesting. A contribution would be welcome here as well - I think it makes 
sense to propagate the JDBC metadata into the Arrow metadata if it's useful for 
applications.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] lidavidm commented on issue #14039: How to get Arrow Schema for PostgreSQL column of hstore(map) and geometry types in JdbcToArrowUtils.jdbcToArrowSchema ?

Reply via email to