Iceberg Partition by via Spark

Mayank Thirani Mon, 28 Mar 2022 18:11:52 -0700

Hi Team,

We are trying to use Spark for creating some sample tables for testing to
see how metadata file/ folders look when we use "partition by". Links which
helped us to follow:
https://iceberg.apache.org/docs/latest/spark-ddl/#alter-table--drop-partition-field


We created a table with partition by using one column (for say: city) and
can see the metadata file and folders created by city in S3 using below
commands:
create table samplesMainTestPartitionCity partition by (city) as select *
from sampleTable limit 1000

[image: image.png]
*00000-fc....json* is the metadata file generated for the same:

Secondly, we tried to drop the partition field (city) using the below
command:
ALTER TABLE nessie.samplesMainTestPartitionCity DROP PARTITION FIELD city

We got a new metadata file for it (*00001-9d.....json* is the new one).
But we can still use the partitions folder as shown above. Expectation was
that no such folders would be there.
So we tried to add a new partition field after dropping based on the below
command:
ALTER TABLE nessie.samplesMainTestPartitionCity ADD PARTITION FIELD state

We got a new metadata file for it (*00002-26.....json*) and no new folders
are generated based on the state.
This looks incorrect to us. Can you please explain.


-- 
Thanks
-Mayank

00000-fc0eba45-fdc1-4cb3-97b6-41cc8337274e.metadata.json
Description: application/json

00001-9d4b492d-cbcf-4c75-a72c-3d73c8e481b6.metadata.json
Description: application/json

00002-26f945db-22c9-4c75-a7a9-49e1af77f400.metadata.json
Description: application/json

Iceberg Partition by via Spark

Reply via email to