Re: Generic JDBC Sink

Hari Shreedharan Thu, 12 Dec 2013 11:02:40 -0800

Haha, no. I have just been swamped to start a meaningful discussion. If you do 
have time, you can do that too - you don’t need to wait for me to start it.



Thanks,
Hari


On Thursday, December 12, 2013 at 9:40 AM, Jeremy Karlson wrote:

> Did I fall asleep at the wheel for a bit and miss the discussion on
> contributed sources / sinks?
>  
> -- Jeremy
>  
>  
>  
> On Thu, Nov 28, 2013 at 11:04 AM, Hari Shreedharan <
> [email protected] (mailto:[email protected])> wrote:
>  
> > I think we could add this to flume as a contrib module (rather than in core
> > flume itself). At this time, there is no contrib module yet, but I will
> > start a discussion on this early next week on the dev list and let's take
> > it from there.
> >  
> >  
> > Hari
> >  
> > On Thursday, November 28, 2013, Jeremy Karlson wrote:
> >  
> > > I suppose that really depends on the usage scenario. There are a hundred
> > > things that may affect the ability of the Flume chain to keep up with
> > > incoming data, only one of which is the sink being a JDBC connection. I
> > > think for cases like mine where the data is structured and of a
> > >  
> >  
> > reasonable
> > > volume, a JDBC connection makes sense.
> > >  
> > > I guess what I'm saying is that if someone uses it without thinking or
> > > testing what they're doing with it... That's not a problem with JDBC,
> > >  
> >  
> > the
> > > sink, or Flume. It's a problem with the operator. :-P
> > >  
> > > -- Jeremy
> > >  
> > >  
> > >  
> > > On Thu, Nov 28, 2013 at 8:33 AM, Steve Morin <[email protected] 
> > > (mailto:[email protected])
> > <javascript:;>>
> > > wrote:
> > >  
> > > > Think the biggest problem is not that people wouldn't want to use it
> > but
> > > > that data wouldn't be written fast enough to DB's to clear channels in
> > >  
> > > many
> > > > moderate volumes.
> > > >  
> > > > I'll follow the ticket thanks
> > > >  
> > > >  
> > > > On Thu, Nov 28, 2013 at 8:17 AM, Jeremy Karlson <
> > [email protected] (mailto:[email protected])<javascript:;>
> > > > wrote:
> > > >  
> > > > > Hi Steve,
> > > > >  
> > > > > I’ve submitted the sink for review here:
> > > > >  
> > > > > http://issues.apache.org/jira/browse/FLUME-2256
> > > > >  
> > > > > If it’s something that interests you, I encourage you to apply the
> > patch
> > > > > and let me know if it meets your needs or if you find problems.
> > > > >  
> > > > > So far, no movement on it… But it’s only been a couple of days. If
> > > > > Flume doesn’t want it (for whatever reason) I’ll just take off all of
> > > > >  
> > > >  
> > >  
> > > the
> > > > > Apache headers and put it up on GitHub with a similar license. It’ll
> > > >  
> > >  
> > > get
> > > > > open sourced one way or another, but I think folding it into Flume
> > > >  
> > >  
> > >  
> >  
> > makes
> > > > > the most sense.
> > > > >  
> > > > > -- Jeremy
> > > > >  
> > > > >  
> > > > > On Nov 28, 2013, at 7:39, Steve Morin <[email protected] 
> > > > > (mailto:[email protected])
> > <javascript:;>>
> > > wrote:
> > > > >  
> > > > > Jeremy,
> > > > > I am interested in a JDBC flume sink are you open sourcing it?
> > > > > -Steve
> > > > >  
> > > > >  
> > > > > On Tue, Nov 26, 2013 at 8:52 PM, Jeremy Karlson <
> > > [email protected] (mailto:[email protected]) 
> > > <javascript:;>>wrote:
> > > > >  
> > > > > > Is there any interest in a generic JDBC sink?
> > > > > >  
> > > > > > Over the few days I decided to try and write one. I have something
> > > that
> > > > > > requires more testing, but seems to be working.
> > > > > >  
> > > > > > Since the config file is how you’d interact with it, here’s a 
> > > > > > working
> > > > > > example from my source tree:
> > > > > >  
> > > > > > a.sinks.k.type=jdbc
> > > > > > a.sinks.k.channel=c
> > > > > > a.sinks.k.driver=com.mysql.jdbc.Driver
> > > > > > a.sinks.k.url=jdbc:mysql://localhost:8889/flume
> > > > > > a.sinks.k.user=username
> > > > > > a.sinks.k.password=password
> > > > > > a.sinks.k.batchSize=100
> > > > > > a.sinks.k.sql=insert into twitter (body, timestamp) values
> > > > > > (${body:string}, ${header.timestamp:long})
> > > > > >  
> > > > > > The interesting part is the SQL statement. You can put anything you
> > > > > > want in there - it will get converted to a prepared statement on
> > > > > >  
> > > > >  
> > > >  
> > >  
> > > execution.
> > > > > > The Ant-ish tokens get parsed and replaced with parameters at
> > > > >  
> > > >  
> > >  
> > >  
> >  
> > startup.
> > > > > >  
> > > > > > The tokens are three part. For example, in:
> > > > > >  
> > > > > > ${body:string(UTF-8)}
> > > > > >  
> > > > > > The first is a place in the event to get the value from (“body”,
> > > > > > “header.foo”, or “custom”). The second part ("string") is a type
> > > > > > identifier that converts into an appropriate JDBC parameter. The
> > > > > >  
> > > > >  
> > > >  
> > >  
> >  
> > third
> > > > > > part (“UTF-8") is a configuration string for that type, if needed.
> > > > >  
> > > >  
> > >  
> >  
> > As
> > > for
> > > > > > types, so far I’ve defined:
> > > > > >  
> > > > > > body: string (with optional charset encoding), bytearray
> > > > > > header: string, long, int, float, double, date (with mandatory date
> > > > > > format and optional timezone)
> > > > > >  
> > > > > > Additionally, if none of those make you happy you can define you own
> > > > > > parameter converters:
> > > > > >  
> > > > > > ${custom:com.company.foo.MyConverter(optionaltextconfig)}
> > > > > >  
> > > > > > I know there is still improvement to be made, but I’d like to get
> > some
> > > > > > feedback, bug fixes, and maybe get it included before I do a bunch 
> > > > > > of
> > > > > > useless work. If there is interest, how would you like it for review
> > > > > >  
> > > > >  
> > > >  
> > >  
> > > or
> > > > > > inclusion?
> > > > > >  
> > > > > > -- Jeremy

Re: Generic JDBC Sink

Reply via email to