Ah no problem at all.. We are all here to make it better :)

Let me spend sometime over the weekend on the PR and these suggestions and
see if I can make some changes upstream

On Thu, May 7, 2020 at 6:44 AM Allen Underwood
<[email protected]> wrote:

> Hi Vinoth,
>
> Thank you for the reply!  I'll go ahead and retarget the PR for master.
>
> On the checkstyle plugin, I tried to set it up and must have misconfigured
> something because it never showed up in the IDE.  I'm sure it was me being
> in a rush to make progress so I bailed when I didn't get immediate results
> :-)
>
> The commons jar is in the project POM in the release-0.5.2 branch.
>
> And on the Serializable - basically everything checks out fine in Unit
> tests and all that.  But apparently the Spark applications that use it
> heavily leans on serializable classes, so if you had any instance variables
> that are not serializable, you'd get runtime errors - basically I'd tried
> to keep a DateTimeFormatter instance variable and it only blew up when
> running as a spark job.  The interesting thing was I tried to make it a
> transient variable and that didn't work either, which leads me to believe
> that the spark jobs were actually deserializing those variables to use
> later on - it was a nice learning curve for me.
>
> And definitely not throwing stones on the documentation - documentation is
> not easy...
>
> Thanks again!
>
> Allen
>
> On Thu, May 7, 2020 at 2:40 AM Vinoth Chandar <
> [email protected]> wrote:
>
>> Hi Allen,
>>
>> Thanks for the valuable feedback! Can we retarget this to master.. and the
>> 0.5.3 RM can backport it on top of 0.5.2..
>>
>> Sharing what I know on these points.
>>
>> - Once you install the checkstyle plugin, you can setup IntelliJ to use
>> checkstyle file as the code style and that has been working fairly well
>> for
>> me atleast.
>> - Hmmm. master does not have commons ..  did you use checkstyle from
>> master
>> or 0.5.1? We disallowed some libraries like apache commons/guava since
>> they
>> cause jar/class mismatches a lot when integrating into all these query
>> engines :)
>> - Agree.. `public abstract class KeyGenerator implements Serializable
>> should
>> have taken care of it, I would think so. You are referring to the
>> KeyGenerator impl, right
>> - Docs are definitely being worked on. Pratyaksh has the JIRA assigned for
>> now I think. IMO we can add this to `writing data` page.
>>
>>
>>
>>
>> On Wed, May 6, 2020 at 12:20 PM Allen Underwood
>> <[email protected]> wrote:
>>
>> > Hello Vinoth,
>> >
>> > As promised, here's a PR into 0.5.2 - I think it might be worth bringing
>> > that into master / 0.5.3 as well.  But I figured I'd at least get this
>> PR
>> > out there for someone to review.
>> > https://github.com/apache/incubator-hudi/pull/1597
>> >
>> > For what it's worth, there were definitely some point points I
>> encountered:
>> > * checkstyle.xml - not supported well at all in IntelliJ - downloaded a
>> > plugin and it didn't help - had to compile, find errors, rinse / repeat
>> > * There are libraries included in the pom.xml that for some reason are
>> not
>> > allowed per the checkstyle.xml - doesn't make sense
>> (org.apache.commons.*)
>> > - why have them in the project if you can't use them?
>> > * Would have thought everything was good to go after running unit tests,
>> > but when deploying to a real cluster, found that the entire class had
>> to be
>> > serializable - would have been nice to know that before-hand as that
>> would
>> > have saved several cycles. - probably worth documenting somewhere?
>> > * Don't really know where I should document these changes as I only
>> found
>> > out how to do these things via Vinoth's original reply to my email -
>> would
>> > be nice if there was some sort of "extending Hudi" documentation
>> somewhere
>> >
>> > Hope this becomes useful for someone else.  FYI - this is working
>> > perfectly for my use-case.  Unit tests show several different approaches
>> > but I wouldn't mind throwing some documentation together to help folks
>> out.
>> >
>> > Let me know if you need anything else to help move this along - surely I
>> > can't be the only one that needed it!  :-)
>> >
>> > Allen
>> >
>> > On Tue, May 5, 2020 at 11:22 AM Vinoth Chandar <[email protected]>
>> wrote:
>> >
>> >> Great!
>> >>
>> >> On Mon, May 4, 2020 at 5:43 PM Allen Underwood
>> >> <[email protected]> wrote:
>> >>
>> >> > Hi  Vinoth,
>> >> >
>> >> > Yes I was going to set some things up in the morning. I’ll let you
>> know
>> >> > how it turns out and if it’s worth a PR I’ll get one together.
>> >> >
>> >> > Thanks again for your help!
>> >> >
>> >> > Allen
>> >> >
>> >> > On Mon, May 4, 2020 at 8:40 PM Vinoth Chandar <[email protected]>
>> >> wrote:
>> >> >
>> >> >> Thanks both!
>> >> >>
>> >> >> @allen heard this many times :) hear you. You could write a small
>> class
>> >> >> yourself with your custom logic and throw it in there?
>> >> >>
>> >> >> If you think there is a way to fix the key generator in Hudi to be
>> more
>> >> >> resilient to these (e.g taking in a list of supported patterns vs
>> just
>> >> the
>> >> >> one), let us know.
>> >> >>
>> >> >> On Mon, May 4, 2020 at 3:08 PM Allen Underwood
>> >> >> <[email protected]> wrote:
>> >> >>
>> >> >> > Hi Vinoth - that was extremely helpful...I almost had it working,
>> >> >> HOWEVER,
>> >> >> > it appears I have dates that some have the ms on the end and
>> others
>> >> >> > don't....so if I pick adding a time format with them, then the
>> ones
>> >> >> without
>> >> >> > the fail and vice versa....Good times.
>> >> >> >
>> >> >> > After I figure this out I'll see if I can put this information
>> >> somewhere
>> >> >> > easy to find.
>> >> >> >
>> >> >> > On Mon, May 4, 2020 at 12:23 PM Vinoth Chandar <[email protected]
>> >
>> >> >> wrote:
>> >> >> >
>> >> >> >> Hi Allen,
>> >> >> >>
>> >> >> >> You are able to configure the key generator for deltastreamer
>> using
>> >> >> this
>> >> >> >> property (either via a file or --config )
>> >> >> >> hoodie.datasource.write.keygenerator.class
>> >> >> >>
>> >> >> >> You might be interested in this built-in generator.
>> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> https://github.com/apache/incubator-hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/keygen/TimestampBasedKeyGenerator.java#L64
>> >> >> >> It takes let you configure a field as a recordKey, and if you can
>> >> parse
>> >> >> >> your timestamp using Java SimpleDateFormat, you can specify the
>> >> >> datetime
>> >> >> >> field and a pattern to parse it into..
>> >> >> >>
>> >> >> >> Happy to make this work for you.
>> >> >> >>
>> >> >> >> community, any volunteers to faq/document this? :)
>> >> >> >>
>> >> >> >>
>> >> >> >> On Mon, May 4, 2020 at 9:11 AM Allen Underwood
>> >> >> >> <[email protected]> wrote:
>> >> >> >>
>> >> >> >> > I’ve tried to do my due diligence by googling / searching this
>> >> slack
>> >> >> and
>> >> >> >> > I’ve come up empty.Is there a way through configuration /
>> >> >> deltastreamer
>> >> >> >> > to extract a custom partition key?Basically I have a datetime
>> >> field
>> >> >> in a
>> >> >> >> > Kafka Source that has an ISO8601 datetime….is there a way to
>> >> extract
>> >> >> a
>> >> >> >> > partition value out of that?I found this after some Googling,
>> but
>> >> >> this
>> >> >> >> > seems like it’d only be useful if I wanted to write my own
>> writer
>> >> >> >> > application:
>> >> >> >> >
>> >> >> >> >
>> >> >> >>
>> >> >>
>> >>
>> https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.javaAny
>> >> >> >> > way to do what I need through configuration of the spark job /
>> >> hudi
>> >> >> >> > configuration?
>> >> >> >> >
>> >> >>
>> >>
>> hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.java
>> >> >> >> > <
>> >> >> >>
>> >> >>
>> >>
>> https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.java
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > --
>> >> >> >> > *Allen Underwood*
>> >> >> >> >
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > *Allen Underwood*
>> >> >> > Principal Software Engineer
>> >> >> > Broadcom | Symantec Enterprise Division
>> >> >> > *Mobile*: 404.808.5926
>> >> >> >
>> >> >>
>> >> > --
>> >> > *Allen Underwood*
>> >> > Principal Software Engineer
>> >> > Broadcom | Symantec Enterprise Division
>> >> > *Mobile*: 404.808.5926
>> >> >
>> >>
>> >
>> >
>> > --
>> > *Allen Underwood*
>> > Principal Software Engineer
>> > Broadcom | Symantec Enterprise Division
>> > *Mobile*: 404.808.5926
>> >
>>
>
>
> --
> *Allen Underwood*
> Principal Software Engineer
> Broadcom | Symantec Enterprise Division
> *Mobile*: 404.808.5926
>

Reply via email to