westonpace commented on issue #30648:
URL: https://github.com/apache/arrow/issues/30648#issuecomment-1409155308

   @khoatrandata if I understand that issue correctly, the user is trying to 
load a column (with type=jsonb) into Arrow.  There is no equivalent Arrow data 
type (and as far as I can tell no one has ever asked for it before).  I think a 
variable-length binary column should be sufficient for many purposes.
   
   It looks like the current approach is to first load the column into python 
objects (this will give you a heterogeneous list of python objects).  This list 
is then passed to `pa.array`.  however, there is no guarantee you will be able 
to turn that into an Arrow array and there is no knowing what the result will 
be (if all the values are numbers you'll get an int64 array.  If all the values 
are strings you'll get a string array, if the values are mixed you'll get the 
reported exception).
   
   If the goal is to go to parquet and back then the safest thing to do would 
be to load the column as binary and save it in parquet as binary (with your own 
custom metadata to indicate it is a JSONB field).
   
   You could also create a JSONB extension type based on the variable length 
binary data type.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to