jorisvandenbossche commented on pull request #11385: URL: https://github.com/apache/arrow/pull/11385#issuecomment-941013899
This is a bit hacky solution; not sure it is robust enough to be accepted (or the best approach). Currently, the mapping from string column name to parquet field index is done on the Python side. This is based on the FileMetaData.SchemaDescriptor, iterating through the columns and getting the dotted path of each column. The problem is that at this point, there is no easy way to know (AFAIK) if the column (ColumnDescriptor) is for a list type or not, to be able to also construct the shorter version of the dotted path. So therefore I did this on the C++ level, but by just by exposing a "shorter" dotted path in addition to the default one that excludes the list inner elements. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org