Re: multi-partitioned hudi table | partitions not created

SATISH SIDNAKOPPA Mon, 29 Apr 2019 23:11:38 -0700

files in hdfs


/apps/hive/warehouse/emp_multi_partkey/part1=A/part2=2018

manaul create table
CREATE EXTERNAL TABLE `emp_multi_partkey`(
  `_hoodie_commit_time` string,
  `_hoodie_commit_seqno` string,
  `_hoodie_record_key` string,
  `_hoodie_partition_path` string,
  `_hoodie_file_name` string,
emp_id string
part_col` string)
PARTITIONED BY (
  `part1` string,
  `part2` string)

in dataset these 2 columns exists too

concat('part1=',part1,'/part2=',part2) as part_col
where part1=A and part2=2018

I am able to update and delete records.Will there be any gap if this
process in followed?

On Tue, Apr 30, 2019 at 11:36 AM SATISH SIDNAKOPPA <
[email protected]> wrote:

> Hi Vinoth,
>
> I created the multi_part as below.
>
> in dataset ---> concat('part1=',SUBSTR(emp_name,1,1),'/part2=','2018') as
> part_col
> in spark.write hud set ------>
> .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY,"part_col")
>
> files in hdfs
>
>
> alter table hudi.emp_multi_partkey add partition(part1='A',part2='2018') ;
>
>
>
>
> On Mon, Apr 29, 2019 at 8:30 PM Vinoth Chandar <[email protected]> wrote:
>
>> Hi Satish,
>>
>> Thats because the default KeyGenerator class only reads in a single field
>> to partition on. What you are expecting is a composite key.
>>
>> Nishith has one in the test suite PR
>>
>> https://github.com/apache/incubator-hudi/pull/623/files#diff-8814d5eb596f19bc9a87e419453fd7c8
>>
>> We plan to add this to the main code. For now, you can copy the class and
>> see if solves your need? KeyGenerator is pluggable anyway
>>
>> Thanks
>> Vinoth
>>
>> On Mon, Apr 29, 2019 at 7:20 AM SATISH SIDNAKOPPA <
>> [email protected]> wrote:
>>
>> > Hi Team,
>> >
>> >
>> > I have to store data by department and region.
>> > /dept=HR/region=AP
>> > /dept=OPS/region=AP
>> > /dept=HR/region=SA
>> > /dept=OPS/region=SA
>> >
>> > so partitioned table created will have multi-keys
>> >
>> >
>> > I tried passing value as comma separated(dept,region)
>> >
>> .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY,"dept,region")
>> >
>> > and dot separated,
>> >
>> .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY,"dept.region")
>> >
>> > but the partitions were not created in hdfs.All the data added to
>> default
>> > partition.
>> >
>> >
>> > Could you guide in format of passing the multi-partitions to spark write
>> > hudi dataset.
>> >
>> > regards
>> > Satish S
>> >
>>
>

Re: multi-partitioned hudi table | partitions not created

Reply via email to