#general
@humengyuk18: Can multiple table consume from the same kafka topic using low level consumer in Pinot?
@mayanks: Yes. Each table is an independent entity. You can configure the same kafka topic for multiple of them
@mayanks: Added to faq:
@humengyuk18: Thanks, I will try it.
@patidar.rahul8392: Hi Team, I am trying to use hdfs as deepatorage and created controller.conf server.conf and broker.conf files. Kindly review and Provide your valuable suggestions on attached file. Please suggest/guide if anything needs to change or add.
@sleepythread: @patidar.rahul8392 you have added script in wrong file. e.g: For Broker create a file broker.conf and add following in that file. ```pinot.set.instance.id.to.hostname=true pinot.server.grpc.enable=true``` After this, run the following in your terminal ( or you have also create a shell script file) to start the broker ```exportHADOOP_HOME=/user/local/hadoop exportHADOOP_VERSION=2.7.1 exportHADOOP_GUAVA_VERSION=11.0.2 exportHADOOP_GSON_VERSION=2.2.4 exportGC_LOG_LOCATION=/tmp/gc_log/ exportPINOT_VERSION=0.7.1 exportPINOT_DISTRIBUTION_DIR=/tmp/my_data_2/apache-pinot-incubating-0.7.1-bin exportSERVER_CONF_DIR=/tmp/my_data_2/apache-pinot-incubating-0.7.1-bin/bin exportZOOKEEPER_ADDRESS=00.000.00.000:2181 export CLASSPATH_PREFIX="${HADOOP_HOME}/share/hadoop/hdfs/hadoop-hdfs- ${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/hadoop-annotations- ${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/hadoop-auth- ${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/hadoop-common- ${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/guava- ${HADOOP_GUAVA_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/gson- ${HADOOP_GSON_VERSION}.jar" export JAVA_OPTS="-Xms4G -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 - XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime - XX:+PrintGCApplicationConcurrentTime-Xloggc:${GC_LOG_LOCATION}/gc-pinot-broker.log" ${PINOT_DISTRIBUTION_DIR}/bin/start-broker.sh -zkAddress ${ZOOKEEPER_ADDRESS} - configFileName ${SERVER_CONF_DIR}/broker.con```
@sleepythread: Perform similar operation for almost all the other services as well
@patidar.rahul8392: Thanks alot @sleepythread
@patidar.rahul8392: @tingchen @sleepythread @ken @slack1 @mayanks kindly help.
@syedakram93: Hi, I am running a query which involves AND , OR and with some filters on string and long values. It has basically 34Million rows , and querying(selecting few columns for an ID) takes almost 2 sec and numEntriesScannedInFilter(89Million) & numEntriesScannedPostFilter are bigger values. Can someone help me to understand, how come this many entries scanned in filter, where i am using Inverted index...?
@mayanks: Are there any ORs that can be changed into INs? If so try that
@syedakram93: yeah we can do that
@syedakram93: after changing it from OR to IN wherever possible giving same time taken, but record scanned is more this time
@mayanks: Can you paste the new query and the records scanned?
@mayanks: and also all columns that have sorted/inv index?
@syedakram93: Query: select AUDITLOGID,RECORDID,RECORDNAME,AUDITEDTIME,USERID,ACTIONTYPE,SOURCE,ACTIONINFO,OTHERDETAILS,DONEBY FROM auditlog WHERE ((((((RELATEDID = 553493000165096765) AND (RELATEDMODULE = 'Contacts')) AND (AUDITEDTIME >= 1588214155000)) AND ((((((((MODULE = 'Potentials') AND ((ACTIONTYPE = 19) OR (ACTIONTYPE = 20))) OR ((MODULE = 'Potentials') AND (((((OTHERDETAILS = 'Unattended Dialled') OR (OTHERDETAILS = 'Attended Dialled')) OR (OTHERDETAILS = 'Scheduled Attended')) OR (OTHERDETAILS = 'Received')) OR (OTHERDETAILS = 'Missed')))) OR (ACTIONTYPE IN (36,34,19))) OR ((ACTIONTYPE IN (10,11,1)) AND (SOURCE = 19))) OR ((ACTIONTYPE IN (10,11,19)) AND (SOURCE = 10))) OR ((ACTIONTYPE = 1) AND (MODULE = 'Potentials'))) OR (ACTIONTYPE = 69))))) limit 5000 table config: { "OFFLINE": { "tableName": "auditlog_OFFLINE", "tableType": "OFFLINE", "segmentsConfig": { "replication": "1", "segmentPushType": "REFRESH" }, "tenants": { "broker": "DefaultTenant", "server": "DefaultTenant" }, "tableIndexConfig": { "invertedIndexColumns": [ "RELATEDID", "AUDITLOGID" ], "autoGeneratedInvertedIndex": false, "createInvertedIndexDuringSegmentGeneration": false, "loadMode": "MMAP", "enableDefaultStarTree": false, "enableDynamicStarTreeCreation": false, "aggregateMetrics": false, "nullHandlingEnabled": false }, "metadata": { "customConfigs": {} }, "ingestionConfig": {}, "isDimTable": false } }
@mayanks: You didn’t change the otherDetails to IN?
@syedakram93: totalDocs : 34Milion num ofsegments: 1 num of servers: 1 numEntriesScannedInFilter : 89627266
@syedakram93: otherDetails is basically a json string
@syedakram93: i will change that too
@syedakram93: i changed otherDetails too, but giving same result /timetaken and stats
@syedakram93: my main concern is, how come *numEntriesScannedInFilter* is 89Million which is very high in number
@mayanks: You have inv index only on 2 columns?
@syedakram93: yes
@syedakram93: but when i checked metadata.properties, it shows almost all columns have invertedindex
@mayanks: Yeah that is misleading. It is only for the two columns you specified
@syedakram93: ohh
@syedakram93: shall i specify the columns which i used to filter in invertedindex?
@syedakram93: ACTIONTYPE, OTHERDETAILS, MODULE, SOURCE
@syedakram93: ?
@mayanks: Which ones of them have high cardinality?
@syedakram93: unique values will be more in this in case of OTHERDETAILS column
@mayanks: Ok but that is json?
@mayanks: As in configured as json for Pinot or Pinot thinks it is just a string?
@syedakram93: its string
@syedakram93: toString()
@mayanks: Ok, let’s start with that
@mayanks: And if that data size is not that much, may be set for all
@syedakram93: what do u mean by data size?
@mayanks: Total data size in Pinot
@syedakram93:
@syedakram93: segment tar.gz size is 639M
@syedakram93: i applied inv index , now timetaken to finish the query took 393ms(earlier it was almost 2sec), but numOfEntriesScanned is 2140274...
@syedakram93: can i reduce numOfEntriesScanned it some more
@mayanks: Is this with inv index on all columns?
@mayanks: Also I do not see a sorted column
@syedakram93: Yes with 12 columns inv index and no sorted column
@syedakram93: Will apply pk as sorted index and check if entries scanned can reduced
@mayanks: Yeah, sorted column will help reduce further.
@mayanks: Will the pk be part of most queries? And what kind of predicate would you have on that pk in the queries?
@syedakram93: check this for ref
@pedro.cls93: Hello, Is there a way to check the current kafka offset that a realtime table is reading on at a given point in time?
@mayanks: Not at the moment, but you can check the previous saved offset from segment Zk metadata
@pedro.cls93: Thought as much, thank you @mayanks
@mayanks: Could you tell us a bit more on how you wanted to use it?
@pedro.cls93: My realtime table wasn't consuming from kafka. These messages are stringified jsons which frequently change schemas or have may occasionally have some errors, I wanted to understand what was the last read offsets for each kafka partition to be able to accurately debug where Pinot stopped.
@mayanks: Got it. There are metrics emitted for consumption. You could see those going to zero when consumption would have stopped. That would give you the time-frame.
@npawar: You can use the consumingSegmentsInfo Api
@pedro.cls93: In my use-case ingestion time-frame in Pinot does not correlate kafka's message time. The kafka messages are computed in batch and sent in bulk to the pinot ingestion topic.
@npawar: You'll find it on swagger under segments section
@pedro.cls93: There seems to an issue, the request does not reach the cluster?
@npawar: On the top left hand corner change http to https
@pedro.cls93: Brilliant! Thank you very much Neha!
@patidar.rahul8392: What is process to setup a 3 node pinot cluster. With 2 controller 2 server and 2 controller.I have separate servers for zookeeper and kafka and already up and running. Now on remaining 3 servers I want to set 2 controllers , 2 brokers and 2 server. Kindly suggest. Total available servers=3 Total components needs to install= 6
@pedro.cls93: Hello, Can I combine
@npawar: No that's not supported. However, a groovy script should be able to do all of the logic you need rt? What are you trying to achieve?
@pedro.cls93: Groovy should be sufficient, yes, I'm actively trying it now. For context: A have a json field in my table, let's call it `result` I'm trying to retrieve a field from the `result` payload, called `duration` and parse it into a Long field in the Pinot Table to represent the duration in milliseconds. The value of the `duration` property in the `result` field can have multiple forms: `HH:MM:SS.SSSS / HH:MM:SS / HH:MM:SS.SSSSSSS` or simply not exist. To complicate matters, the key of `duration` is not consistent, when it appears, in some messages it may come as `Duration` and others as `duration` This is legacy and not something I can fix upstream unfortunately.
@g.kishore: You can technically combine it but we don’t let groovy access any other classes for security reasons
#troubleshooting
@jmeyer: Hello What is the recommended approach to getting the "last non-null value" ? Use a UDF ?
@pedro.cls93: Hello, does anyone know the meaning of the following warn log in the server: `2021/05/10 09:11:48.182 WARN [LLRealtimeSegmentDataManager_HitExecutionView__4__5__20210430T1719Z] [HitExecutionView__10__4__20210429T1719Z] Commit failed with response {"streamPartitionMsgOffset":null,"buildTimeSec":-1,"isSplitCommitType":false,"status":"FAILED","offset":-1}` This is a realtime table, consuming messages from kafka with upsert enabled. I've noticed that the table hasn't been updating offsets for the past 3 days.
@pedro.cls93: I'm also getting the following exception from the server instances: ```2021/05/10 10:29:48.876 ERROR [ServerSegmentCompletionProtocolHandler] [HitExecutionView__13__6__20210510T1029Z] Could not send request
@pedro.cls93: Controller is logging the following warnings at the same time: ```2021/05/10 10:32:07.285 WARN [TopStateHandoffReportStage] [HelixController-pipeline-default-pinot-data-(3c494c91_DEFAULT)] Event 3c494c91_DEFAULT : Cannot confirm top state missing start time. Use the current system time as the start time.```
@mayanks: Yeah, seems server is unable to connect to controller
@mayanks: Is controller accessible otherwise?
@pedro.cls93: Yes, the process has not suffered restarts and grafana monitoring does not report anything in particular. Java memory is manageable (84.3% of total), GC is working, CPU load is ok (1% of total)
@mayanks: Hmm can you log on to server and try to connect to controller?
@pedro.cls93: I can connect to the controller and get a bash shell yes
@tamas.nadudvari: Hello, we’re using `RealtimeToOfflineSegmentsTask` for our hybrid table and we ran into a problem in our dev environment. We have time gaps in our data ingest and when it’s larger than the bucket time period, the minion task runs into an error of creating empty segment. After exception the minion fails to update the watermark, thus we’re ending up with a stuck task (trying to create an empty segment over an over again for this specific period). While it’s unlikely to run into this empty segment problem in production, we’re wondering what’s the recommended way to overcome this issue in a dev environment?
@mayanks: @jackie.jxt
@laxman: I’m also trying this feature in our test environment. Watermark is stored as ZK metadata. You can try updating the metadata manually. Try setting it to the next available segment start time stamp.
@npawar: thanks for the suggestion Laxman! @tamas.nadudvari do you mind creating a github issue for this? will see if it can be fixed easily
@npawar: or if you’d like to give it a shot to fix, that would be great too
@npawar: another thing i can think of is to increase the bucket time ?
@tamas.nadudvari: We’re also running into a strange problem with hybrid tables that might be connected to completely empty realtime table(part): querying the hybrid table gives no records while querying the offline table(part) does. Filtering the time column to an interval for records in the offline table doesn’t help either. Failed to find any relevant log message that can help me identify, what could be the issue.
@aaron: I'm trying to track down where my SegmentCreation job is spending time. It's been running for a few hours, but here are the only time measurements I see: ```Assembled and processed 7990100 records from 17 columns in 121766 ms: 65.618484 rec/ms, 1115.5142 cell/ms time spent so far 0% reading (237 ms) and 99% processing (121766 ms) at row 7990100. reading next block block read in memory in 44 ms. row count = 1418384 Finished building StatsCollector! Collected stats for 9408484 documents Created dictionary for INT column: ... ... RecordReader initialized will read a total of 9408484 records. at row 0. reading next block Got brand-new decompressor [.gz] block read in memory in 133 ms. row count = 7990100 Start building IndexCreator! Assembled and processed 7990100 records from 17 columns in 127060 ms: 62.884464 rec/ms, 1069.0359 cell/ms time spent so far 0% reading (133 ms) and 99% processing (127060 ms) at row 7990100. reading next block block read in memory in 26 ms. row count = 1418384 Finished records indexing in IndexCreator! Finished segment seal! ... Generated 25884 star-tree records from 9408484 segment records Finished constructing star-tree, got 1228 tree nodes and 2058 records under star-node Finished creating aggregated documents, got 1227 aggregated records Finished building star-tree in 276631ms Starting building star-tree with config: StarTreeV2BuilderConfig[...] Generated 9408484 star-tree records from 9408484 segment records``` It's been stuck at that last line for a long time but still seems to be doing something. All the time measurements only add up to a few minutes. Any idea where it's spending time?
@ken: There was a similar question recently in <#CDRCA57FC|general>, where GCing could be a potential source of slowness (trying to include a ref, let’s see if this works:
@fx19880617: I feel it’s mostly spent on startree generation
@avasudevan: Hi, Could you guys help me with the Avro -> Pinot Schema Conversion? `Avro Schema:` ```{ "type": "record", "name": "Clickstream", "namespace": "com.acme.event.clickstream.business", "fields": [ { "name": "event_header", "type": { "type": "record", "name": "EventHeader", "namespace": "com.acme.event", "fields": [ { "name": "event_uuid", "type": { "type": "string", "avro.java.string": "String", "logicalType": "uuid" }, "doc": "Universally Unique Identifier for this event " }, { "name": "published_timestamp", "type": { "type": "long", "logicalType": "timestamp-millis" }, "doc": "Timestamp in milliseconds since the epoch that the event occurred on its producing device. e.g. <code>System.currentTimeMillis()</code>." }] } } }``` The corresponding Pinot Schema i have is: ```{ "schemaName": "user_clickstream_v1", "dimensionFieldSpecs": [ { "name": "event_header.event_uuid", "dataType": "STRING" } ], "dateTimeFieldSpecs": [{ "name": "event_header.published_timestamp", "dataType": "LONG", "format" : "1:MILLISECONDS:EPOCH", "granularity": "1:MILLISECONDS" }] }``` In the created Pinot Table i see all the values as null….I suspect the issue is in the Schema… Any idea?
@npawar: did you set transform functions in the table config?
@npawar: this demo explains what transform function you’d have to set:
@npawar: treat the avro hierarchy similar to the json as shown in the demo
#pinot-dev
@syedakram93: Hi, I am running a query which involves AND , OR and with some filters on string and long values. It has basically 34Million rows , and querying(selecting few columns for an ID) takes almost 2 sec and numEntriesScannedInFilter(89Million) & numEntriesScannedPostFilter are bigger values. Can someone help me to understand, how come this many entries scanned in filter, where i am using Inverted index...?
@syedakram93:
@ssahay: @ssahay has joined the channel
@snlee: Hi all, I would like to use this thread for discussing about 0.8.0 release. Some points that we need to discuss: • Should we cut the release before or after the graduation? • Let’s call the timeline so that people can check-in the features to be included to the release I think that we should make the call first on the before/after graduation and then decide the timeline.
@mayanks: Thanks @snlee, my thoughts: ```1. If the graduation process requires a release, let's cut it before. 2. If it does not, then lets fix things like licensing at least. 3. Let's collect data if we have enough features since 0.7.1 that calls for a new release.```
#minion-improvements
@jackie.jxt: The `NullValueTransformer` is the one to fill default value for `null` values. Currently in the `PinotSegmentRecordReader` we don't fill the `null` values back to the record, and that causes the null value vector not being reconstructed.
@jackie.jxt: In `MutableSegmentImpl.getRecord()` we have logic to backfill the `null` values, and we should do similar operation for `PinotSegmentRecordReader`
@laxman: I spent some time over the weekend and today. I think I got the exact root cause. Testing a patch.
@laxman: I think the major root cause for these null issues is pinot to avro schema conversion. While converting we treat all avro fields as non-nullable fields.
@laxman:
@laxman: Fixes needed in this schema conversion • Nullable fields we need to use unions (union {null, int} for a nullable int type) • Default values to be set properly
@laxman: @jackie.jxt /@fx19880617: Does this make sense? I’m not very sure whether this fixes the null vector problem. But it will fix these NPEs in avro and rightly.
@fx19880617: I see, then shall we always convert a filed to nullable in avro then?
@laxman: Yes. Always a union {null, actual type}
@npawar: i dont understand why a field in the segmnt will be nullable. When the segment was created in the first place, the NullValueTransformer would have put the default value correctly rt?
@laxman: @npawar: Following is one case where a field value can be null. > We are seeing issues with nullable fields added recently. These fields were not there in the old segments. And we are seeing NPE issues while converting old REALTIME segments to OFFLINE.
@npawar: i see. so it doesnt exist in the segment at all
@laxman: Yes. Examples & issue
@npawar: in an ideal scenario, after you add the new field, you would invoke a reloadSegments, which would generate the new column in the old segments too
@npawar: i think your solution is also fine. But we might have to be careful if the filter/record transformation/aggregation steps need to happen on the new null field. We’d have to handle null in each of those steps
@laxman: Does reload segments updates the deep store as well? iiuc, conversion task loads the segments from deep store. Hence the question.
@npawar: ah, it does not. Hmm this is going to be a problem then.. Especially now that we’ve added derived fields
@laxman: Can you please explain little more? Can I proceed with union approach during pinot->avro schema conversion? Or that will cause other problems?
@npawar: you’re right. the conversion task will download from deep store. That segment copy will not have the default values for new column. If you proceed with your approach, we will face problems in the further steps of the SegmentProcessor, like filtering, partitioning, etc, in case they start using the new column. To avoid this, Jackie’s suggestion is better, of adding the Default values for null values, inside the PinotSegmentRecordReader. The bigger problem I’m thinking about is that we’ve introduced the concept of derived columns. So lets say segment has columns `x,y`. Now someone can add new column `z=x+y` for example, wherein value of new column is derived from the existing values. But this new value is also not updated into the deep store. In this case, during the RealtimeToOffline job, it is not correct to treat it as null, neither is it correct to give it default value
@jackie.jxt: After transforming the input record with `CompositeTransformer`, the `GenericRow` should have `_nullValueFields` set for `null` fields. We should store them into the avro, then read them out in the reducer
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org