phpsxg opened a new issue, #14834:
URL: https://github.com/apache/arrow/issues/14834

   ### Describe the usage question you have. Please include as many useful 
details as  possible.
   
   
   First, save the parquet file, there are 5 pieces of data
   ```
   dataset_name = 'test_update'
   df = pd.DataFrame({'one': [-1, 3, 2.5, 2.5, 2.5],
                      'two': ['foo', 'bar', 'baz','foo','foo'],
                      'three': [True, False, True,False,False]},
                     )
   table = pa.Table.from_pandas(df)
   ds.write_dataset(table, dataset_name,
       existing_data_behavior='overwrite_or_ignore',
       format="parquet")
   ```
   
   Then I want to add two new ones, and I want to get a total of 7 results, and 
the new data is as follows:
   ```
   df = pd.DataFrame({'one': [1, 2],
                      'two': ['foo-insert1','foo-insert2'],
                      'three': [True, False]},
                     )
   
   table = pa.Table.from_pandas(df)
   ds.write_dataset(table, dataset_name,
       # existing_data_behavior='delete_matching',
       existing_data_behavior='overwrite_or_ignore',
       format="parquet")
   ```
   1. **But this overwrites the original, there are only two data, how to 
achieve new data on the basis of the original**
   
   2. **I have another question, if I want to update the data according to the 
conditions, how to change how to do it, for example**
   
   > Update one=-1, two=foo's three to False
   
   
   - python=3.10
   - pyarrow=10.0.0
   
   
   
   ### Component(s)
   
   Parquet, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to