Re: Druid Auto Field Type Detection

Furkan KAMACI Mon, 11 Feb 2019 22:52:14 -0800

Hi Gian,

I've created an issue for it:
https://github.com/apache/incubator-druid/issues/7027


Could you add a comment where I can start to implement such a feature.

Kind Regards,
Furkan KAMACI

On Tue, Feb 12, 2019 at 1:43 AM Gian Merlino <gianmerl...@gmail.com> wrote:

> Yeah that's a good point. Maybe we should store some extra information
> about what the type was in the original input.
>
> On Sat, Jan 26, 2019 at 4:04 AM Furkan KAMACI <furkankam...@gmail.com>
> wrote:
>
> > Hi Gian,
> >
> > Same problem applies to null fields too. When first record is null, it
> will
> > not possible to detect such a field's type.
> >
> > However, problem is different at my case. You may have an ad-hoc field
> > which is not defined at beginning. Such a field should have strict type
> but
> > not known at the beginning. At your example case, we may define such
> field
> > as Integer and throw error or skip an entry which has a value if "foo"
> due
> > to field is initialized as Integer. On the other hand, sending a datum
> as:
> >
> > field: 3
> >
> > and
> >
> > field: "3"
> >
> > maybe threatened different. Second one could be String but first one
> should
> > be Integer.
> >
> > I think that Solr could be an example for us such a schemaless mode.
> > What do you think?
> >
> > Kind Regards,
> > Furkan KAMACI
> >
> > On Fri, Jan 25, 2019 at 8:56 PM Gian Merlino <g...@apache.org> wrote:
> >
> > > Hey Furkan,
> > >
> > > Right now when Druid detects dimensions (so called "schemaless" mode,
> > what
> > > you get when you have an empty dimensions list at ingestion time), it
> > > assumes they are all strings. It would definitely be better if it did
> > some
> > > analysis on the incoming data and chose the most appropriate type. I
> > think
> > > the main consideration here is that Druid has to pick a type as soon as
> > it
> > > sees a new column, but it might not get it right just by looking at the
> > > first record. Imagine some JSON data where you have a field that is the
> > > number 3 for the first row Druid sees, but the string "foo" in the
> > second.
> > > The right type would be string, but Druid wouldn't know that when it
> gets
> > > the first row.
> > >
> > > Maybe it would work to do some mechanism where auto-detected fields are
> > > ingested as strings initially into IncrementalIndex, and then
> potentially
> > > converted to a different type when written to disk.
> > >
> > > On Thu, Jan 10, 2019 at 12:43 AM Furkan KAMACI <furkankam...@gmail.com
> >
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I can define auto type detection for timestamp as follows:
> > > >
> > > > "timestampSpec" : {
> > > >      "format" : "auto",
> > > >      "column" : "ts"
> > > > }
> > > >
> > > > In similar manner, I cannot detect field type via parseSpec. I mean:
> > > >
> > > >
> > > >
> > >
> >
> {"ts":"2018-01-01T03:35:45Z","app_token":"guid1","eventName":"app-x","properties-key1":"123"}
> > > >
> > > >
> > > >
> > >
> >
> {"ts":"2018-01-01T03:35:45Z","app_token":"guid2","eventName":"app-x","properties-key2":123}
> > > >
> > > > Both properties-key1 and properties-key2 are indexed as String. I
> > expect
> > > to
> > > > index properties-key2 as Integer at Druid.
> > > >
> > > > So, is there any mechanism at Druid about letting Druid auto filed
> type
> > > > detection for a newly created field? If not, I would like to
> implement
> > > such
> > > > a feature.
> > > >
> > > > Kind Regards,
> > > > Furkan KAMACI
> > > >
> > >
> >
>

Re: Druid Auto Field Type Detection

Reply via email to