doc987 opened a new issue #6004: Automatically Determine Ingestion Schema
URL: https://github.com/apache/incubator-druid/issues/6004
 
 
   Druid should be able to automatically determine the schema to use when the 
schema is implied by the data received.
   
   Suppose data is being provided by a collection agent such as Collectd or 
Telegraf.  The specific fields that will be present may not be know ahead of 
time, will depend on which metrics are collected by those agents, and could 
change across software versions.  However, Druid should still be able to locate 
the timestamp, dimensions, and metrics.
   
   For example, the Telegraf output formats are documented here: 
https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md. 
 Due to the structure, InfluxDB does not need to be pre-provided with a schema 
in order to load the data.
   
   The Collectd ouput format documentation is less clear.  Format options 
include PUTVAL, JSON, and Graphite.  
https://collectd.org/wiki/index.php/Table_of_Plugins
   
   Is there a way to ingest this kind of data without trying to determine every 
possible field that could be collected for every possible unit that was 
monitored, and then writing a corrsponding ingestion schema?  The exercise 
would also have to be repeated every time there was a software version change.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to