pyarrow.parquet.write_to_dataset - Seeking an help in addressing some customer request

Mahesha Nayak Wed, 26 Oct 2022 13:03:59 -0700

Hello Team,
I'm glad that I would be connecting with people who are helping in
'pyarrow' usages.
my question is below snippet creating datafiles in s3 with partition keys
however partition columns are not part of datafiles hence is there any way
we can add the partition column in the datafiles so that customer query the
datafile will see the data for all columns + partition column.


kindly help me. onthis.


"targetKey = self.s3BucketName + '/' + outputDirectory + targetDirectory[:-1]

log.debug("Single or Multiple Partition columns being passed.",partitionCols
,[partitionCols], targetKey)
pq.write_to_dataset(table=table, root_path=targetKey, row_group_size=self.
chunkSizeLimit, partition_cols=[partitionCols], filesystem=s3, compression=
'snappy',partition_filename_cb=lambda x:'-'.join(str(x))+'.parquet')

-- 
Regards,
Mahesha S
Cell:8015127140

pyarrow.parquet.write_to_dataset - Seeking an help in addressing some customer request

Reply via email to