Hi Gian, I've created an issue for it: https://github.com/apache/incubator-druid/issues/7027
Could you add a comment where I can start to implement such a feature. Kind Regards, Furkan KAMACI On Tue, Feb 12, 2019 at 1:43 AM Gian Merlino <gianmerl...@gmail.com> wrote: > Yeah that's a good point. Maybe we should store some extra information > about what the type was in the original input. > > On Sat, Jan 26, 2019 at 4:04 AM Furkan KAMACI <furkankam...@gmail.com> > wrote: > > > Hi Gian, > > > > Same problem applies to null fields too. When first record is null, it > will > > not possible to detect such a field's type. > > > > However, problem is different at my case. You may have an ad-hoc field > > which is not defined at beginning. Such a field should have strict type > but > > not known at the beginning. At your example case, we may define such > field > > as Integer and throw error or skip an entry which has a value if "foo" > due > > to field is initialized as Integer. On the other hand, sending a datum > as: > > > > field: 3 > > > > and > > > > field: "3" > > > > maybe threatened different. Second one could be String but first one > should > > be Integer. > > > > I think that Solr could be an example for us such a schemaless mode. > > What do you think? > > > > Kind Regards, > > Furkan KAMACI > > > > On Fri, Jan 25, 2019 at 8:56 PM Gian Merlino <g...@apache.org> wrote: > > > > > Hey Furkan, > > > > > > Right now when Druid detects dimensions (so called "schemaless" mode, > > what > > > you get when you have an empty dimensions list at ingestion time), it > > > assumes they are all strings. It would definitely be better if it did > > some > > > analysis on the incoming data and chose the most appropriate type. I > > think > > > the main consideration here is that Druid has to pick a type as soon as > > it > > > sees a new column, but it might not get it right just by looking at the > > > first record. Imagine some JSON data where you have a field that is the > > > number 3 for the first row Druid sees, but the string "foo" in the > > second. > > > The right type would be string, but Druid wouldn't know that when it > gets > > > the first row. > > > > > > Maybe it would work to do some mechanism where auto-detected fields are > > > ingested as strings initially into IncrementalIndex, and then > potentially > > > converted to a different type when written to disk. > > > > > > On Thu, Jan 10, 2019 at 12:43 AM Furkan KAMACI <furkankam...@gmail.com > > > > > wrote: > > > > > > > Hi All, > > > > > > > > I can define auto type detection for timestamp as follows: > > > > > > > > "timestampSpec" : { > > > > "format" : "auto", > > > > "column" : "ts" > > > > } > > > > > > > > In similar manner, I cannot detect field type via parseSpec. I mean: > > > > > > > > > > > > > > > > > > {"ts":"2018-01-01T03:35:45Z","app_token":"guid1","eventName":"app-x","properties-key1":"123"} > > > > > > > > > > > > > > > > > > {"ts":"2018-01-01T03:35:45Z","app_token":"guid2","eventName":"app-x","properties-key2":123} > > > > > > > > Both properties-key1 and properties-key2 are indexed as String. I > > expect > > > to > > > > index properties-key2 as Integer at Druid. > > > > > > > > So, is there any mechanism at Druid about letting Druid auto filed > type > > > > detection for a newly created field? If not, I would like to > implement > > > such > > > > a feature. > > > > > > > > Kind Regards, > > > > Furkan KAMACI > > > > > > > > > >