[influxdb] Re: kapacitor record query does not return same points as line protocol query

Heath Raftery Fri, 06 Jan 2017 16:20:35 -0800

Here's a more copy/paste-able version to try out:

influx -execute 'DROP DATABASE d'
influx -execute 'CREATE DATABASE d'
influx -database="d" -execute 'INSERT m a=1 1'
influx -database="d" -execute 'INSERT m b=2 2'
influx -database="d" -execute 'INSERT m,t1=3 a=3,b=3 3'
influx -database="d" -execute 'INSERT m,t2=4 a=4,b=4 4'
influx -database="d" -execute 'SELECT * FROM m'


That returns the measurement contents as expected:

name: m
time a b t1 t2
---- - - -- --
1    1 
2      2 
3    3 3 3 
4    4 4    4

Now try the same with kapacitor record query:

echo "stream
    |from()
        .measurement('m')
    |log()
        .prefix('RECORD_ISSUE')" >record_issue.tick
        
kapacitor define record_issue -type stream -tick record_issue.tick  -dbrp d.
autogen
rid=$(kapacitor record query -query $'SELECT * FROM "d"."autogen"."m"' -type 
stream)
kapacitor replay -task record_issue -recording $rid -rec-time


sudo tail -15 /var/log/kapacitor/kapacitor.log | grep "RECORD_ISSUE"

Only the 3 points with a in them appear in the log, and none of the tags 
appear.

Same thing happens when fields are specified explicitly:

rid=$(kapacitor record query -query $'SELECT a,b FROM "d"."autogen"."m"' -type 
stream)
kapacitor replay -task record_issue -recording $rid -rec-time


sudo tail -15 /var/log/kapacitor/kapacitor.log | grep "RECORD_ISSUE"


Tags can be recovered by adding GROUP BY *:

rid=$(kapacitor record query -query $'SELECT * FROM "d"."autogen"."m" GROUP 
BY *' -type stream)
kapacitor replay -task record_issue -recording $rid -rec-time


sudo tail -15 /var/log/kapacitor/kapacitor.log | grep "RECORD_ISSUE"


But still can't get all the points. They can be forced to appear by 
changing the order of the SELECT clause and using multiple queries, but 
then there are duplicates and they're out of order:

rid=$(kapacitor record query -query $'SELECT a,b FROM "d"."autogen"."m" 
GROUP BY *; SELECT b,a FROM "d"."autogen"."m" GROUP BY *' -type stream)
kapacitor replay -task record_issue -recording $rid -rec-time


sudo tail -15 /var/log/kapacitor/kapacitor.log | grep "RECORD_ISSUE"



Please give these a go and let me know if you experience the same.

Regards,
Heath


On Saturday, January 7, 2017 at 3:06:08 AM UTC+11, Heath Raftery wrote:
>
> I can't make sense of the kapacitor record query syntax. The documentation 
> only covers the HTTP API, not the CLI, but the Custom Anomaly Detection 
> <https://docs.influxdata.com/kapacitor/v1.1/examples/anomaly_detection/> 
> example 
> gets one started and the response to kapacitor record query without 
> arguments prints some guidelines.
>
> Everything suggest that the query is a standard line protocol string. But 
> there are some confusing differences. Observe:
>
> $ influx
> > CREATE DATABASE record
> > USE record
> > INSERT points a=1
> > INSERT points b=2
> > SELECT * FROM points
> name: points
> time a b
> ---- - -
> 1483716972843739426 1 
> 1483716980505185982 2
>
> > exit
> $ cat record_issue.tick
> stream
>     |from()
>         .measurement('points')
>     |log()
>         .prefix('POINTS_ISSUE')
> $ kapacitor define record_issue -type stream -tick record_issue.tick 
>  -dbrp record.autogen
> $ rid=$(kapacitor record query -query $'SELECT * FROM 
> "record"."autogen"."points"' -type stream)
> $ kapacitor replay -task record_issue -recording $rid -rec-time
>
> At this point the log file shows:
>
> [record_issue:log2] 2017/01/07 02:40:46 I! POINTS_ISSUE 
> {"Name":"points","Database":"record","RetentionPolicy":"autogen","Group":"","Dimensions":{"ByName":false,"TagNames":null},"Tags":{},"Fields":{"a":1},"Time":"2017-01-06T15:36:12.843739426Z"}
>
> which indicates that only the points with an 'a' field have been recorded. 
> Executing the same query at the CLI returns both points.
>
> Other differences:
>
> $ rid=$(kapacitor record query -query $'SELECT a,b FROM 
> "record"."autogen"."points"' -type stream)
>
> still only returns 'a' field, not 'b' field, when replayed.
>
> $ influx
> > USE record
> > INSERT points,t1=3 a=3,b=3
> > INSERT points,t2=4 a=4,b=4
> > SELECT * FROM points
> name: points
> time a b t1 t2
> ---- - - -- --
> 1483716972843739426 1 
> 1483716980505185982 2 
> 1483717340694062989 3 3 3 
> 1483717349891893752 4 4 4
> > exit
> $ rid=$(kapacitor record query -query $'SELECT a,b FROM 
> "record"."autogen"."points"' -type stream)
>
> only returns 'a', no tags.
>
> $rid=$(kapacitor record query -query $'SELECT * FROM 
> "record"."autogen"."points" GROUP BY *' -type stream)
>
> Returns all the tags this time, but not the points without a in it (ie. 
> the 1483716980505185982 point).
>
>
> Three questions:
>
>    1. Why does each field key need to be named in the SELECT statement, 
>    unlike CLI queries which return all the fields listed in the SELECT 
>    statement.
>    2. How do you return all fields (in order - multiple statements is not 
>    suitable) from a measurement given the * doesn't work.
>    3. Why is GROUP BY required to return the tags when it's unnecessary 
>    for the CLI.
>
>
> By the way, the record venture is invaluable. Being able to tweak 
> kapacitor scripts, and test them on history data before deploring to 
> practice is crucial.
>
> Regards,
> Heath
>

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/1387ae3f-7e79-4607-af29-a0f8be02c03a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[influxdb] Re: kapacitor record query does not return same points as line protocol query

Reply via email to