[ 
https://issues.apache.org/jira/browse/ARROW-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2633:
----------------------------------
    Description: 
 
I am trying to read a parquet file in pandas dataframe, do some manipulation 
and write it back in the same file, however it seems file is not accessible to 
write after the first read in same function.

It only works, if I don't perform STEP 1 below. Is there anyway to unlock the 
file as such?

{code:python}
#STEP 1: Read entire parquet file
pq_file = pq.ParquetFile('\dev\abc.parquet')
exp_df = pq_file.read(nthreads=1, use_pandas_metadata=True).to_pandas()
#STEP 2: Change some data in dataframe
#
#STEP 3: write merged dataframe
pyarrow_table = pa.Table.from_pandas(exp_df)
pq.write_table(pyarrow_table, '\dev\abc.parquet',compression='none',)
{code}

Error:

{code}
File "C:\Python36\lib\site-packages\pyarrow\parquet.py", line 943, in 
write_table
 **kwargs)
File "C:\Python36\lib\site-packages\pyarrow\parquet.py", line 286, in __init__
 **options)
File "_parquet.pyx", line 832, in pyarrow._parquet.ParquetWriter.__cinit__
File "error.pxi", line 79, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Failed to open local file: \dev\abc.parquet , error: 
Invalid argument
{code}

  was:
 
I am trying to read a parquet file in pandas dataframe, do some manipulation 
and write it back in the same file, however it seems file is not accessible to 
write after the first read in same function.

It only works, if I don't perform STEP 1 below. Is there anyway to unlock the 
file as such?

{{#STEP 1: Read entire parquet file pq_file = 
pq.ParquetFile('\dev\abc.parquet') exp_df = pq_file.read(nthreads=1, 
use_pandas_metadata=True).to_pandas() #STEP 2:  # Change some data in dataframe 
#STEP 3: write merged dataframe pyarrow_table = pa.Table.from_pandas(exp_df) 
pq.write_table(pyarrow_table, '\dev\abc.parquet',compression='none',)}}

Error:

{{File "C:\Python36\lib\site-packages\pyarrow\parquet.py", line 943, in 
write_table **kwargs) File "C:\Python36\lib\site-packages\pyarrow\parquet.py", 
line 286, in __init__ **options) File "_parquet.pyx", line 832, in 
pyarrow._parquet.ParquetWriter.__cinit__ File "error.pxi", line 79, in 
pyarrow.lib.check_status pyarrow.lib.ArrowIOError: Failed to open local file: 
\dev\abc.parquet , error: Invalid argument}}


> Parquet file not accesible to write after first read using PyArrow
> ------------------------------------------------------------------
>
>                 Key: ARROW-2633
>                 URL: https://issues.apache.org/jira/browse/ARROW-2633
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Suman
>            Priority: Major
>
>  
> I am trying to read a parquet file in pandas dataframe, do some manipulation 
> and write it back in the same file, however it seems file is not accessible 
> to write after the first read in same function.
> It only works, if I don't perform STEP 1 below. Is there anyway to unlock the 
> file as such?
> {code:python}
> #STEP 1: Read entire parquet file
> pq_file = pq.ParquetFile('\dev\abc.parquet')
> exp_df = pq_file.read(nthreads=1, use_pandas_metadata=True).to_pandas()
> #STEP 2: Change some data in dataframe
> #
> #STEP 3: write merged dataframe
> pyarrow_table = pa.Table.from_pandas(exp_df)
> pq.write_table(pyarrow_table, '\dev\abc.parquet',compression='none',)
> {code}
> Error:
> {code}
> File "C:\Python36\lib\site-packages\pyarrow\parquet.py", line 943, in 
> write_table
>  **kwargs)
> File "C:\Python36\lib\site-packages\pyarrow\parquet.py", line 286, in __init__
>  **options)
> File "_parquet.pyx", line 832, in pyarrow._parquet.ParquetWriter.__cinit__
> File "error.pxi", line 79, in pyarrow.lib.check_status
> pyarrow.lib.ArrowIOError: Failed to open local file: \dev\abc.parquet , 
> error: Invalid argument
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to