westonpace commented on issue #30648: URL: https://github.com/apache/arrow/issues/30648#issuecomment-1409155308
@khoatrandata if I understand that issue correctly, the user is trying to load a column (with type=jsonb) into Arrow. There is no equivalent Arrow data type (and as far as I can tell no one has ever asked for it before). I think a variable-length binary column should be sufficient for many purposes. It looks like the current approach is to first load the column into python objects (this will give you a heterogeneous list of python objects). This list is then passed to `pa.array`. however, there is no guarantee you will be able to turn that into an Arrow array and there is no knowing what the result will be (if all the values are numbers you'll get an int64 array. If all the values are strings you'll get a string array, if the values are mixed you'll get the reported exception). If the goal is to go to parquet and back then the safest thing to do would be to load the column as binary and save it in parquet as binary (with your own custom metadata to indicate it is a JSONB field). You could also create a JSONB extension type based on the variable length binary data type. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
