phpsxg opened a new issue, #14834:
URL: https://github.com/apache/arrow/issues/14834
### Describe the usage question you have. Please include as many useful
details as possible.
First, save the parquet file, there are 5 pieces of data
```
dataset_name = 'test_update'
df = pd.DataFrame({'one': [-1, 3, 2.5, 2.5, 2.5],
'two': ['foo', 'bar', 'baz','foo','foo'],
'three': [True, False, True,False,False]},
)
table = pa.Table.from_pandas(df)
ds.write_dataset(table, dataset_name,
existing_data_behavior='overwrite_or_ignore',
format="parquet")
```
Then I want to add two new ones, and I want to get a total of 7 results, and
the new data is as follows:
```
df = pd.DataFrame({'one': [1, 2],
'two': ['foo-insert1','foo-insert2'],
'three': [True, False]},
)
table = pa.Table.from_pandas(df)
ds.write_dataset(table, dataset_name,
# existing_data_behavior='delete_matching',
existing_data_behavior='overwrite_or_ignore',
format="parquet")
```
1. **But this overwrites the original, there are only two data, how to
achieve new data on the basis of the original**
2. **I have another question, if I want to update the data according to the
conditions, how to change how to do it, for example**
> Update one=-1, two=foo's three to False
- python=3.10
- pyarrow=10.0.0
### Component(s)
Parquet, Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]