Apache Pinot Daily Email Digest (2021-09-28)

Pinot Slack Email Digest Tue, 28 Sep 2021 19:00:34 -0700

#general

@gauravjindal25: I have a question on Pinot. In our company we are capturing user behavioral data as events on our application. Since it’s a clickstream data we are looking for a platform that can enable adhoc query on this data with low latency. Is Pinot a recommended option? Has anyone tried it? Product Analytics tool such as Amplitude, Mixpanel etc also enable real time analytics on clickstream data. So wondering if Pinot has a similar or better technology to do adhoc analysis on event data
@mayanks: When you say ad hoc, do you mean more dynamic slice and dice? If so, yes, Pinot is a good option
@gauravjindal25: yes. Anyone tried and willing to show demo?
@dadelcas: Hello, it looks like minion doesn't seem to expose any pinot metric at the moment - release 0.8.0. I'm interested in monitoring task failures, is there an easy way to do this other than monitoring logs?
@mayanks: We have added some debug apis in 0.8. @ramabaratam are the minion debug apis 0.8?
@ramabaratam: Yes. ```/tasks/{taskType}/taskcounts Fetch count of sub-tasks for each of the tasks for the given task type /tasks/{taskType}/debug Fetch information for all the tasks for the given task type /tasks/task/{taskName}/debug Fetch information for the given task name```
@dadelcas: The problem is I need to integrate with existing monitoring tools, not sure if this API will help me anyhow. Cheers
@ramabaratam: i see these MinionMeter metrics - which will translate to Prometheus pinot_minion_numberTasksExecuted etc. with label "id" as taskType . ``` NUMBER_TASKS_EXECUTED("tasks", false), NUMBER_TASKS_COMPLETED("tasks", false), NUMBER_TASKS_CANCELLED("tasks", false), NUMBER_TASKS_FAILED("tasks", false), NUMBER_TASKS_FATAL_FAILED("tasks", false); ```
@mayanks:
@ramabaratam: You might want to look at MinionQueryPhase:: ```TASK_EXECUTION``` for total execution time for tasktype. will translate to Prometheus timer metric as pinot_minion_taskExecution...
@mayanks: @dadelcas ^^ seems like we might have some metrics that you can already use currently.
@dadelcas: I've also got a question about partitioning in hybrid tables. If I understand correctly this only applies to offline tables. Does `segmentPartitionConfig` play together with the time column? The field only accepts 1 value at the moment and I was wondering whether segments are generated using `timeColumnName` and further partitioned using the `segmentPartitionConfig`? If I don't specify partitioning then the segments are effectively replicated to all the servers?
@mayanks: This config helps partition data on a primary key. You typically don’t want to choose time column for this, but a column that appears on most queries with equality predicate. If you don’t use any partitioning or replica group based assignment, then each segment can go to any of the servers (n copies).
@dadelcas: Thank you Mayank. So the attribute `segmentConfig.timeColumnName` is not used in an offline table?
@dadelcas: I guess if I need to partition my data, let's say, by hour then I need to inject a new field in my data and use it the partition config of the offline table
@mayanks: No no, I am saying time column already gets special treatment in terms of time based pruning already. If you partition your data on time column, you don't need to explicitly tell Pinot. The `segmentPartitionConfig` is more for a primary key based partitioning.
@dadelcas: Got it, thanks for clarifying that
@sirsh: Hello - is there a way to use the controller's REST interface to submit OFFLINE table ingestion tasks? Details are; • Parquet files on S3 • Have created a schema and table spec already on Pinot (which is deployed on K8s (Argo/Helm) Have seen there is an ingestion task that can be triggered e.g. using the scripts/utils in the pinot distribution but would like to directly do this using REST commands. I would like to apply the strategy of `SegmentCreationAndUriPush` - and i would either setup something that is running on a daily or hourly schedule OR just trigger once off tasks myself. Either works.
@g.kishore:
@g.kishore: ```curl -X POST " &batchConfigMapStr={ "inputFormat":"json", "input.fs.className":"org.apache.pinot.plugin.filesystem.S3PinotFS", "input.fs.prop.region":"us-central", "input.fs.prop.accessKey":"foo", "input.fs.prop.secretKey":"bar" } &sourceURIStr="```
@sirsh: Thank you @g.kishore - the docs suggest this is for small files or testing because files need to be downloaded - or am i reading this wrong?
@mayanks: The ingestionFromURI endpoint is for a quickstart kind of setup.
@mayanks: Do you mean you want to schedule and control the job that can generate and push segmetns?
@sirsh: yes - that is exactly what i want to do.
@g.kishore: we dont have this now but we are thinking of a simple solution.. can you please file a ticket.. I will add my thoughts to that
@barana: @barana has joined the channel
@sirsh: Hello... i have a question related to Kafka and SSL specifically. I submitted schema and REALTIME table specs but i can see that my SSL configuration is not correct. I would like to understand for a standard deployment of Pinot using Helm on K8s where i would expect the SSL cert location to be to i can configure SLL correctly for my table - adding a segment of my spec to the thread
@sirsh: ```"tableIndexConfig": { "loadMode": "MMAP", "streamConfigs": { "streamType": "kafka", "security.protocol": "SSL", "ssl.truststore.location": "/opt/pinot/kafka.client.truststore.jks", "stream.kafka.topic.name": "MY-TOPIC", "stream.kafka.consumer.type": "lowlevel", "stream.kafka.consumer.prop.auto.offset.reset": "largest", "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory", "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.confluent.KafkaConfluentSchemaRegistryAvroMessageDecoder", "realtime.segment.flush.threshold.rows": "0", "realtime.segment.flush.threshold.time": "24h", "realtime.segment.flush.segment.size": "100M", "stream.kafka.zk.broker.url": "", "stream.kafka.broker.list": "", "schema.registry.url": "kafka-schema-registry-cp-schema-registry.kafka-schema-registry.svc.cluster.local:8081", "stream.kafka.decoder.prop.schema.registry.rest.url": "kafka-schema-registry-cp-schema-registry.kafka-schema-registry.svc.cluster.local:8081" } }```
@sirsh: @mayanks
@mayanks: Hi @sirsh here's some info if that helps:
@sirsh: Thanks @mayanks - i actually read this one but was unsure of some actual values to use. I can keep experimenting. For example what should the `ssl.truststore.location` be and do i need to do any setup
@mayanks: @slack1 ^^
@slack1: Hi @sirsh - afaik the Helm template wasn’t updated yet to handle the injection of SSL certs and keystores. The fastest way to get you to a working setup would be to create a configmap with prep’d keystore/truststore and then hack the deployment spec to include them as local volumes on the path you set up above. If you build out a more generic solution, we’d be very glad to include them in the chart as a contribution. There’s just so much to do around Pinot right now.
@sirsh: thats very useful to know - makes sense, thanks both!
@weixiang.sun: What does “disabling the realtime table” mean? No Streaming Ingestion and No query served? I do not see any specific document about it.
@mayanks: Disabling the table means stopping both. May I ask what specifically are you looking for?

#random

@barana: @barana has joined the channel

#troubleshooting

@yash.agarwal: I am trying to setup a new pinot cluster. I have a zookeeper cluster up. When I try to get the first pinot controller up, it gets up, and then fails with the an error ```Pinot Controller instance [Controller_piclx1001.hq.target.com_9000] is Started... Started Pinot [CONTROLLER] instance [Controller_piclx1001.hq.target.com_9000] at 13.884s since launch Shutting down Pinot Service Manager with all running Pinot instances... Trying to stop Pinot [CONTROLLER] Instance [Controller_piclx1001.hq.target.com_9000] ... Stopping controller periodic tasks Stopping periodic task scheduler . . Instance piclx1001.hq.target.com_9000 is not leader of cluster PinotCluster due to exception happen when session check org.I0Itec.zkclient.exception.ZkInterruptedException: java.lang.InterruptedException at org.apache.helix.manager.zk.zookeeper.ZkClient.retryUntilConnected(ZkClient.java:1192) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-ffcf9b991431067c834bd4fb56fd7641c7fec172] at org.apache.helix.manager.zk.zookeeper.ZkClient.readData(ZkClient.java:1326) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-ffcf9b991431067c834bd4fb56fd7641c7fec172] at org.apache.helix.manager.zk.zookeeper.ZkClient.readData(ZkClient.java:1318) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-ffcf9b991431067c834bd4fb56fd7641c7fec172] at org.apache.helix.manager.zk.ZkBaseDataAccessor.get(ZkBaseDataAccessor.java:320) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-ffcf9b991431067c834bd4fb56fd7641c7fec172] . . Closing zkclient: State:CONNECTED Timeout:30000 sessionid:0x1006d21ecf40000 local:/10.59.116.133:53916 remoteserver: lastZxid:17179869383 xid:1154 sent:1157 recv:1199 queuedpkts:0 pendingresp:0 queuedevents:0 Session: 0x1006d21ecf40000 closed```
@jackie.jxt: Did you explicitly shut down the controller or it shuts down itself? The error happens after the shut down
@karinwolok1: Hey hey! :wave: :speaker: :speaker: :speaker: We're looking for presenters for the Apache Pinot :wine_glass: meetup!!!! :smiley: :brain: Anyone have any topics they're interested in presenting or have ideas for topics you'd like to see, please DM me! :email:
@barana: @barana has joined the channel
@zineb.raiiss: Hello, I want to create a new table on Pinot, on my data source I don't have a column for the time I only have STRING type columns. I created the schema, I made the config table file but when running I got this error do you have an idea or a solution
@zineb.raiiss: executing command: AddTable -tableConfigFile /tmp/pinot-quick-start/tools-table-offline.json -schemaFile /tmp/pinot-quick-start/tools-schema.json -controllerProtocol http -controllerHost 192.168.1.105 -controllerPort 9000 -user null -password [hidden] -exec Sending request: to controller: YD-5CG1182FLG, version: Unknown {"code":400,"error":null}
@xiangfu0: can you share the table conf and schema file?
@zineb.raiiss: I solved the problem, I added "replication": "1" in my config table
@gqian3: Hi team, we are seeing some Pinot query with avg function returning -Infinity when the where clause returns no records, is there a way to modify the query to return Null for this case?
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org

Apache Pinot Daily Email Digest (2021-09-28)

#general

#random

#troubleshooting

Reply via email to