jim-ngoo commented on issue #2201:
URL:
https://github.com/apache/iceberg-python/issues/2201#issuecomment-3065486606
> > ### Question
> > For my use case, I have a daily cron job that batch process and append
data but I only want a single snapshot record after the whole process. I tried
to do them under a single transaction but still there are multiple snapshots
created. Can pyiceberg provide the options such that only 1 snapshot will be
generated under a transaction?
>
> Can you share what function you are using to do the append?
So here's a sample code:
```py
from pyiceberg.catalog.memory import InMemoryCatalog
from pyiceberg.schema import Schema
from pyiceberg.types import LongType, NestedField
import pyarrow as pa
if __name__ == "__main__":
catalog = InMemoryCatalog(name="test_catalog")
catalog.create_namespace_if_not_exists("test_ns")
table = catalog.create_table_if_not_exists(
"test_ns.test_table",
schema=Schema(NestedField(field_id=1, name="a",
field_type=LongType())),
)
with table.transaction() as transaction:
transaction.append(pa.Table.from_pylist([{"a": 1}]))
transaction.append(pa.Table.from_pylist([{"a": 2}]))
for entry in table.metadata.snapshot_log:
print(entry)
```
And two snapshots are created - is there a way I can do batches of append,
but there will be only one snapshot created at the end?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]