Hi Vinoth,

Thank you for the reply!  I'll go ahead and retarget the PR for master.

On the checkstyle plugin, I tried to set it up and must have misconfigured
something because it never showed up in the IDE.  I'm sure it was me being
in a rush to make progress so I bailed when I didn't get immediate results
:-)

The commons jar is in the project POM in the release-0.5.2 branch.

And on the Serializable - basically everything checks out fine in Unit
tests and all that.  But apparently the Spark applications that use it
heavily leans on serializable classes, so if you had any instance variables
that are not serializable, you'd get runtime errors - basically I'd tried
to keep a DateTimeFormatter instance variable and it only blew up when
running as a spark job.  The interesting thing was I tried to make it a
transient variable and that didn't work either, which leads me to believe
that the spark jobs were actually deserializing those variables to use
later on - it was a nice learning curve for me.

And definitely not throwing stones on the documentation - documentation is
not easy...

Thanks again!

Allen

On Thu, May 7, 2020 at 2:40 AM Vinoth Chandar <[email protected]>
wrote:

> Hi Allen,
>
> Thanks for the valuable feedback! Can we retarget this to master.. and the
> 0.5.3 RM can backport it on top of 0.5.2..
>
> Sharing what I know on these points.
>
> - Once you install the checkstyle plugin, you can setup IntelliJ to use
> checkstyle file as the code style and that has been working fairly well for
> me atleast.
> - Hmmm. master does not have commons ..  did you use checkstyle from master
> or 0.5.1? We disallowed some libraries like apache commons/guava since they
> cause jar/class mismatches a lot when integrating into all these query
> engines :)
> - Agree.. `public abstract class KeyGenerator implements Serializable
> should
> have taken care of it, I would think so. You are referring to the
> KeyGenerator impl, right
> - Docs are definitely being worked on. Pratyaksh has the JIRA assigned for
> now I think. IMO we can add this to `writing data` page.
>
>
>
>
> On Wed, May 6, 2020 at 12:20 PM Allen Underwood
> <[email protected]> wrote:
>
> > Hello Vinoth,
> >
> > As promised, here's a PR into 0.5.2 - I think it might be worth bringing
> > that into master / 0.5.3 as well.  But I figured I'd at least get this PR
> > out there for someone to review.
> > https://github.com/apache/incubator-hudi/pull/1597
> >
> > For what it's worth, there were definitely some point points I
> encountered:
> > * checkstyle.xml - not supported well at all in IntelliJ - downloaded a
> > plugin and it didn't help - had to compile, find errors, rinse / repeat
> > * There are libraries included in the pom.xml that for some reason are
> not
> > allowed per the checkstyle.xml - doesn't make sense
> (org.apache.commons.*)
> > - why have them in the project if you can't use them?
> > * Would have thought everything was good to go after running unit tests,
> > but when deploying to a real cluster, found that the entire class had to
> be
> > serializable - would have been nice to know that before-hand as that
> would
> > have saved several cycles. - probably worth documenting somewhere?
> > * Don't really know where I should document these changes as I only found
> > out how to do these things via Vinoth's original reply to my email -
> would
> > be nice if there was some sort of "extending Hudi" documentation
> somewhere
> >
> > Hope this becomes useful for someone else.  FYI - this is working
> > perfectly for my use-case.  Unit tests show several different approaches
> > but I wouldn't mind throwing some documentation together to help folks
> out.
> >
> > Let me know if you need anything else to help move this along - surely I
> > can't be the only one that needed it!  :-)
> >
> > Allen
> >
> > On Tue, May 5, 2020 at 11:22 AM Vinoth Chandar <[email protected]>
> wrote:
> >
> >> Great!
> >>
> >> On Mon, May 4, 2020 at 5:43 PM Allen Underwood
> >> <[email protected]> wrote:
> >>
> >> > Hi  Vinoth,
> >> >
> >> > Yes I was going to set some things up in the morning. I’ll let you
> know
> >> > how it turns out and if it’s worth a PR I’ll get one together.
> >> >
> >> > Thanks again for your help!
> >> >
> >> > Allen
> >> >
> >> > On Mon, May 4, 2020 at 8:40 PM Vinoth Chandar <[email protected]>
> >> wrote:
> >> >
> >> >> Thanks both!
> >> >>
> >> >> @allen heard this many times :) hear you. You could write a small
> class
> >> >> yourself with your custom logic and throw it in there?
> >> >>
> >> >> If you think there is a way to fix the key generator in Hudi to be
> more
> >> >> resilient to these (e.g taking in a list of supported patterns vs
> just
> >> the
> >> >> one), let us know.
> >> >>
> >> >> On Mon, May 4, 2020 at 3:08 PM Allen Underwood
> >> >> <[email protected]> wrote:
> >> >>
> >> >> > Hi Vinoth - that was extremely helpful...I almost had it working,
> >> >> HOWEVER,
> >> >> > it appears I have dates that some have the ms on the end and others
> >> >> > don't....so if I pick adding a time format with them, then the ones
> >> >> without
> >> >> > the fail and vice versa....Good times.
> >> >> >
> >> >> > After I figure this out I'll see if I can put this information
> >> somewhere
> >> >> > easy to find.
> >> >> >
> >> >> > On Mon, May 4, 2020 at 12:23 PM Vinoth Chandar <[email protected]>
> >> >> wrote:
> >> >> >
> >> >> >> Hi Allen,
> >> >> >>
> >> >> >> You are able to configure the key generator for deltastreamer
> using
> >> >> this
> >> >> >> property (either via a file or --config )
> >> >> >> hoodie.datasource.write.keygenerator.class
> >> >> >>
> >> >> >> You might be interested in this built-in generator.
> >> >> >>
> >> >> >>
> >> >>
> >>
> https://github.com/apache/incubator-hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/keygen/TimestampBasedKeyGenerator.java#L64
> >> >> >> It takes let you configure a field as a recordKey, and if you can
> >> parse
> >> >> >> your timestamp using Java SimpleDateFormat, you can specify the
> >> >> datetime
> >> >> >> field and a pattern to parse it into..
> >> >> >>
> >> >> >> Happy to make this work for you.
> >> >> >>
> >> >> >> community, any volunteers to faq/document this? :)
> >> >> >>
> >> >> >>
> >> >> >> On Mon, May 4, 2020 at 9:11 AM Allen Underwood
> >> >> >> <[email protected]> wrote:
> >> >> >>
> >> >> >> > I’ve tried to do my due diligence by googling / searching this
> >> slack
> >> >> and
> >> >> >> > I’ve come up empty.Is there a way through configuration /
> >> >> deltastreamer
> >> >> >> > to extract a custom partition key?Basically I have a datetime
> >> field
> >> >> in a
> >> >> >> > Kafka Source that has an ISO8601 datetime….is there a way to
> >> extract
> >> >> a
> >> >> >> > partition value out of that?I found this after some Googling,
> but
> >> >> this
> >> >> >> > seems like it’d only be useful if I wanted to write my own
> writer
> >> >> >> > application:
> >> >> >> >
> >> >> >> >
> >> >> >>
> >> >>
> >>
> https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.javaAny
> >> >> >> > way to do what I need through configuration of the spark job /
> >> hudi
> >> >> >> > configuration?
> >> >> >> >
> >> >>
> >> hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.java
> >> >> >> > <
> >> >> >>
> >> >>
> >>
> https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.java
> >> >> >> >
> >> >> >> >
> >> >> >> > --
> >> >> >> > *Allen Underwood*
> >> >> >> >
> >> >> >>
> >> >> >
> >> >> >
> >> >> > --
> >> >> > *Allen Underwood*
> >> >> > Principal Software Engineer
> >> >> > Broadcom | Symantec Enterprise Division
> >> >> > *Mobile*: 404.808.5926
> >> >> >
> >> >>
> >> > --
> >> > *Allen Underwood*
> >> > Principal Software Engineer
> >> > Broadcom | Symantec Enterprise Division
> >> > *Mobile*: 404.808.5926
> >> >
> >>
> >
> >
> > --
> > *Allen Underwood*
> > Principal Software Engineer
> > Broadcom | Symantec Enterprise Division
> > *Mobile*: 404.808.5926
> >
>


-- 
*Allen Underwood*
Principal Software Engineer
Broadcom | Symantec Enterprise Division
*Mobile*: 404.808.5926

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to