ghiggi commented on issue #35802:
URL: https://github.com/apache/arrow/issues/35802#issuecomment-3927466464

   I think it can be close. Just for future me or others: 
    
   ```
   import pandas as pd
   import pyarrow as pa
   import dask.dataframe as dd
   
   # Create example table
   table = pa.table({'a': [1, 2, 3], 'b': ["a", "b", "c"]})
    
   # This raise error: ArrowInvalid: Cannot do zero copy conversion into 
multi-column DataFrame block
   df = table.to_pandas(types_mapper=pd.ArrowDtype, zero_copy_only=True)
   
   # Setting split_blocks=True allows to convert to pandas with zero copy
   df = table.to_pandas(types_mapper=pd.ArrowDtype, zero_copy_only=True, 
split_blocks=True)
   
   # If working with daks dataframe ... 
   # - Let's write a parquet file
   filepath = "/tmp/example.parquet"
   df.to_parquet(filepath)
   
   # This fail with ArrowInvalid: Cannot do zero copy conversion into 
multi-column DataFrame block
   arrow_to_pandas = {
       "types_mapper": pd.ArrowDtype,
       "zero_copy_only": True,
       }
   df = dd.read_parquet(filepath,
                        engine="pyarrow", 
                        dtype_backend="pyarrow",
                        arrow_to_pandas = arrow_to_pandas)
   
   # This works
   arrow_to_pandas = {
       "types_mapper": pd.ArrowDtype,
       "zero_copy_only": False,
       "split_blocks": True, # !!!
       }
   df = dd.read_parquet(filepath,
                        engine="pyarrow", 
                        dtype_backend="pyarrow",
                        arrow_to_pandas = arrow_to_pandas)
   
   df.compute()
   ```
   These issues also addressed the problem: 
   
   - https://github.com/apache/arrow/issues/38644
   - https://github.com/apache/arrow/issues/39194
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to