Hi Allen,

Thanks for the valuable feedback! Can we retarget this to master.. and the
0.5.3 RM can backport it on top of 0.5.2..

Sharing what I know on these points.

- Once you install the checkstyle plugin, you can setup IntelliJ to use
checkstyle file as the code style and that has been working fairly well for
me atleast.
- Hmmm. master does not have commons ..  did you use checkstyle from master
or 0.5.1? We disallowed some libraries like apache commons/guava since they
cause jar/class mismatches a lot when integrating into all these query
engines :)
- Agree.. `public abstract class KeyGenerator implements Serializable should
have taken care of it, I would think so. You are referring to the
KeyGenerator impl, right
- Docs are definitely being worked on. Pratyaksh has the JIRA assigned for
now I think. IMO we can add this to `writing data` page.




On Wed, May 6, 2020 at 12:20 PM Allen Underwood
<[email protected]> wrote:

> Hello Vinoth,
>
> As promised, here's a PR into 0.5.2 - I think it might be worth bringing
> that into master / 0.5.3 as well.  But I figured I'd at least get this PR
> out there for someone to review.
> https://github.com/apache/incubator-hudi/pull/1597
>
> For what it's worth, there were definitely some point points I encountered:
> * checkstyle.xml - not supported well at all in IntelliJ - downloaded a
> plugin and it didn't help - had to compile, find errors, rinse / repeat
> * There are libraries included in the pom.xml that for some reason are not
> allowed per the checkstyle.xml - doesn't make sense (org.apache.commons.*)
> - why have them in the project if you can't use them?
> * Would have thought everything was good to go after running unit tests,
> but when deploying to a real cluster, found that the entire class had to be
> serializable - would have been nice to know that before-hand as that would
> have saved several cycles. - probably worth documenting somewhere?
> * Don't really know where I should document these changes as I only found
> out how to do these things via Vinoth's original reply to my email - would
> be nice if there was some sort of "extending Hudi" documentation somewhere
>
> Hope this becomes useful for someone else.  FYI - this is working
> perfectly for my use-case.  Unit tests show several different approaches
> but I wouldn't mind throwing some documentation together to help folks out.
>
> Let me know if you need anything else to help move this along - surely I
> can't be the only one that needed it!  :-)
>
> Allen
>
> On Tue, May 5, 2020 at 11:22 AM Vinoth Chandar <[email protected]> wrote:
>
>> Great!
>>
>> On Mon, May 4, 2020 at 5:43 PM Allen Underwood
>> <[email protected]> wrote:
>>
>> > Hi  Vinoth,
>> >
>> > Yes I was going to set some things up in the morning. I’ll let you know
>> > how it turns out and if it’s worth a PR I’ll get one together.
>> >
>> > Thanks again for your help!
>> >
>> > Allen
>> >
>> > On Mon, May 4, 2020 at 8:40 PM Vinoth Chandar <[email protected]>
>> wrote:
>> >
>> >> Thanks both!
>> >>
>> >> @allen heard this many times :) hear you. You could write a small class
>> >> yourself with your custom logic and throw it in there?
>> >>
>> >> If you think there is a way to fix the key generator in Hudi to be more
>> >> resilient to these (e.g taking in a list of supported patterns vs just
>> the
>> >> one), let us know.
>> >>
>> >> On Mon, May 4, 2020 at 3:08 PM Allen Underwood
>> >> <[email protected]> wrote:
>> >>
>> >> > Hi Vinoth - that was extremely helpful...I almost had it working,
>> >> HOWEVER,
>> >> > it appears I have dates that some have the ms on the end and others
>> >> > don't....so if I pick adding a time format with them, then the ones
>> >> without
>> >> > the fail and vice versa....Good times.
>> >> >
>> >> > After I figure this out I'll see if I can put this information
>> somewhere
>> >> > easy to find.
>> >> >
>> >> > On Mon, May 4, 2020 at 12:23 PM Vinoth Chandar <[email protected]>
>> >> wrote:
>> >> >
>> >> >> Hi Allen,
>> >> >>
>> >> >> You are able to configure the key generator for deltastreamer using
>> >> this
>> >> >> property (either via a file or --config )
>> >> >> hoodie.datasource.write.keygenerator.class
>> >> >>
>> >> >> You might be interested in this built-in generator.
>> >> >>
>> >> >>
>> >>
>> https://github.com/apache/incubator-hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/keygen/TimestampBasedKeyGenerator.java#L64
>> >> >> It takes let you configure a field as a recordKey, and if you can
>> parse
>> >> >> your timestamp using Java SimpleDateFormat, you can specify the
>> >> datetime
>> >> >> field and a pattern to parse it into..
>> >> >>
>> >> >> Happy to make this work for you.
>> >> >>
>> >> >> community, any volunteers to faq/document this? :)
>> >> >>
>> >> >>
>> >> >> On Mon, May 4, 2020 at 9:11 AM Allen Underwood
>> >> >> <[email protected]> wrote:
>> >> >>
>> >> >> > I’ve tried to do my due diligence by googling / searching this
>> slack
>> >> and
>> >> >> > I’ve come up empty.Is there a way through configuration /
>> >> deltastreamer
>> >> >> > to extract a custom partition key?Basically I have a datetime
>> field
>> >> in a
>> >> >> > Kafka Source that has an ISO8601 datetime….is there a way to
>> extract
>> >> a
>> >> >> > partition value out of that?I found this after some Googling, but
>> >> this
>> >> >> > seems like it’d only be useful if I wanted to write my own writer
>> >> >> > application:
>> >> >> >
>> >> >> >
>> >> >>
>> >>
>> https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.javaAny
>> >> >> > way to do what I need through configuration of the spark job /
>> hudi
>> >> >> > configuration?
>> >> >> >
>> >>
>> hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.java
>> >> >> > <
>> >> >>
>> >>
>> https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.java
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > *Allen Underwood*
>> >> >> >
>> >> >>
>> >> >
>> >> >
>> >> > --
>> >> > *Allen Underwood*
>> >> > Principal Software Engineer
>> >> > Broadcom | Symantec Enterprise Division
>> >> > *Mobile*: 404.808.5926
>> >> >
>> >>
>> > --
>> > *Allen Underwood*
>> > Principal Software Engineer
>> > Broadcom | Symantec Enterprise Division
>> > *Mobile*: 404.808.5926
>> >
>>
>
>
> --
> *Allen Underwood*
> Principal Software Engineer
> Broadcom | Symantec Enterprise Division
> *Mobile*: 404.808.5926
>

Reply via email to