#general
@surajkmth29: Hi Team, As part of a POC, We are trying to load pinot table data into a spark dataFrame using the spark JDBC option. However when we try we are seeing the following error: ```Exception in thread "main" java.sql.SQLFeatureNotSupportedException at org.apache.pinot.client.base.AbstractBaseStatement.setQueryTimeout(AbstractBaseStatement.java:167) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:60) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:226) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:355) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325) at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:225)```
@kharekartik: Hi, all the features of JDBC driver are not supported currently. e.g. setQueryTimeout method here. Can I understand your use case so that I can suggest some alternatives?
@surajkmth29: We are exploring the possibility of querying Apache pinot using spark JDBC to gather distinct column values from a table.
@kharekartik: Why not get distinct values directly from pinot?
@kharekartik: It will be much more efficient
@surajkmth29: Hi Kartik, Below is the flow: Pinot table --> spark(read, filter, transform) --> use column data to fetch data from Postgres
@mariums82: @mariums82 has joined the channel
@ysuo: When I add a table, the following error occurred. What’s the possible reasons for it? {“code”:500,“error”:“org.apache.kafka.common.KafkaException: Failed to construct kafka consumer”}
@mark.needham: that usually means Pinot can't connect to Kafka
@mark.needham: not sure why - maybe it can't access the kafka broker
@mark.needham: or it can't authenticate?
@satyam.raj: you can check the pinot-server logs
@kstobbelaar: @kstobbelaar has joined the channel
@aboagyemichaelk: Hi folks, which page replacement algorithm does Pinot use?
@ysuo: Hi, I have some questions about creating a realtime table to ingest a Kafka topic data. 1, If the topic hasn’t been created yet, will the table be added to Pinot? As my tests, Pinot 0.9.3 failed and returned an error. Pinot 0.10 created the table. 2, Pinot returned ‘Failed to construct kafka consumer’ error, and the table could not be seen on the Pinot Web UI. But when I tried to add the table with the same table config and schema, an error occurred with a message like table exists. So I had to rename the table, but got the same error. In this case, if I tried to this several times, then there would be several tables exist in Pinot but could not be seen in Pinot Web UI. Is there some way I can delete those tables? 3, Any idea of the reason for the error-Could not get PartitionGroupMetadata for topic xyz?
#random
@mariums82: @mariums82 has joined the channel
@kstobbelaar: @kstobbelaar has joined the channel
#troubleshooting
@diana.arnos: So, I'm having a similar problem again: ```Find ERROR segment: <tableName>__100__45__20220409T0856Z, table: <tableName>_REALTIME, expected: ONLINE Sleeping 1 second waiting for all segments loaded for partial-upsert table: <tableName>_REALTIME``` When I try to reload the segment, I see this message in the logs: ```Reloading single segment: <tableName>__100__45__20220409T0856Z in table: <tableName>_REALTIME Segment metadata is null. Skip reloading segment: <tableName>t__100__45__20220409T0856Z in table: <tableName>_REALTIME```
@francois: Hi facing a new weird issue :smile: ```java.lang.RuntimeException: Exception getting FSM for segment candidates__0__0__20220408T1410Z at org.apache.pinot.controller.helix.core.realtime.SegmentCompletionManager.lookupOrCreateFsm(SegmentCompletionManager.java:175) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.pinot.controller.helix.core.realtime.SegmentCompletionManager.segmentConsumed(SegmentCompletionManager.java:202) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.pinot.controller.api.resources.LLCSegmentCompletionHandlers.segmentConsumed(LLCSegmentCompletionHandlers.java:144) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at jdk.internal.reflect.GeneratedMethodAccessor128.invoke(Unknown Source) ~[?:?] at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?] at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:219) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:469) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:391) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:80) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:253) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.glassfish.jersey.internal.Errors.process(Errors.java:292) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.glassfish.jersey.internal.Errors.process(Errors.java:274) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.glassfish.jersey.internal.Errors.process(Errors.java:244) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:232) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:679) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer.service(GrizzlyHttpContainer.java:353) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.glassfish.grizzly.http.server.HttpHandler$1.run(HttpHandler.java:200) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:569) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:549) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at java.lang.Thread.run(Thread.java:829) [?:?]``` My controller look for a non existing segment :confused: Is there an API way to tells him it does not exist anymore ? :smile: table still acessible and everything is OK but found it weird
@francois: Edit : for documentation -> restarting the server services fix it :wink:
@ssubrama: Is this the complete log? Or, is there some lines before this one, indicating a null pointer exception, or something like that? It does seem like your server had some issues, since it is trying to complete a segment that does not exist anywhere (my guess). Not sure how that happened. Server logs may convey some more information. This almost feels like server VM was frozen for several days, and suddenly allowed to run again.
@francois: Yes complete logs look like brutal restart of the server :wink:
@ssubrama: Lol, look at the comments that I added before throwing the exception. There is a log before the exception line, and it may help if you can cut/paste that line in the controller.
@mariums82: @mariums82 has joined the channel
@ysuo: Hi, realtimeToOffline task tried to process some segments that were deleted and moved to Deleted Segments folder, in this case, the following error occurred. What should I do if I want Pinot to resume processing other segments and ignore those deleted segments? org.apache.pinot.common.exception.HttpErrorStatusException: Got error status code: 404 (Not Found) with reason: “Segment telemetrics_data__0__0__20220408T0702Z or table telemetrics_data not found in /var/pinot/controller/data/telemetrics_data/telemetrics_data__0__0__20220408T0702Z”
@npawar: the realtimeToOffline will not pick segments to move, unless they were in the ideal state. So it looks like the retention is set too aggressive? and the reatimeToOffline is not able to catch up?
@npawar: afaik, the next run of the minion task should recover this. are you not seeing that?
@npawar: Tagging @xiaobing our minion tasks expert for further analysis if needed
@ysuo: @npawar, I think my test here was a little complex. The realtimeToOffline task failed due to large bucket window and limited resource. Then I modified retention from 10 days to 2 days, so segments older than 2 days were deleted. Then the task failed with the data not found error and it repeated returning the same error.
@xiaobing: hi Alice, we could check the logs for some clues. In Pinot controller logs, you should be able to find some logs about the task scheduling (like by
@eduardo.cusa: Hello guys, similar to this
@npawar: You’ll see the full stack trace and reason on the pinot-server logs. This is usually resource related or another common issue is not able to access deep store.
@npawar: @mark.needham perhaps we need a doc page now, on debugging BAD segments? I dunno if you already have one
@eduardo.cusa: yes, based on the stack trace it seems like is unable to download the segment, In this case, I'm using local storage (deep store is not setup), and it seems like is trying to get the sement from deepstorage: `org.apache.pinot.core.data.manager.BaseTableDataManager.downloadSegmentFromDeepStore(BaseTableDataManager.java:393)`
@mark.needham: @npawar we don't have one, but yeh we should have one. But you'll have to help me with the content as I don't know how to debug/solve bad segments!
@npawar: can you share the entire stack trace?
@eduardo.cusa: Yes, this the response from `
@npawar: isn’t there a longer form of this stack trace in the pinot-server logs, that has “Caused by” section?
@luisfernandez: i get this issue a lot when i’m moving pods around/restarting and whatn ot
@eduardo.cusa: Yes, this is the log from the server:
@npawar: that’s strange. does it clear if you call resetSegment or restart the server?
@npawar: can you check the controller conf and server conf to make sure there are no extra configs about FS?
@eduardo.cusa: Server: ```root@pinot-server-0:/var/pinot/server/config# cat pinot-server.conf pinot.server.netty.port=8098 pinot.server.adminapi.port=8097 pinot.server.instance.dataDir=/var/pinot/server/data/index pinot.server.instance.segmentTarDir=/var/pinot/server/data/segment``` Controller: ```root@pinot-controller-0:/opt/pinot# cat /var/pinot/controller/config/pinot-controller.conf controller.helix.cluster.name=pinot-test controller.port=9000 controller.vip.host=pinot-controller controller.vip.port=9000 controller.data.dir=/var/pinot/controller/data controller.zk.str=pinot-zookeeper:2181```
@shaileshjha061: Hi Team Can anyone assist on this?? We are not able to consume messages from kafka from last day. When I check Debug information for table. Not sure. how to procced with that can anyone assist??
@mark.needham: which version of Pinot?
@mark.needham: did the format of the data in the timestamp field change in the last day?
@shaileshjha061: > which version of Pinot? Pinot Version : 0.7.1 Pinot Chart Version: 0.2.5-SNAPSHOT
@shaileshjha061: > did the format of the data in the timestamp field change in the last day? No
@mark.needham: can you share the schema config for the timestamp field and an example of the data for that field?
@shaileshjha061: Thanks for reply @mark.needham Hers's schema config: ```"dateTimeFieldSpecs": [ { "name": "timestamp", "dataType": "STRING", "format": "1:HOURS:SIMPLE_DATE_FORMAT:yyyy/MM/dd HH:mm:ss'Z'", "granularity": "1:SECONDS" } ]```
@shaileshjha061: `"timestamp":"2022-04-11T05:13:27Z"` Here's example of the data for that field
@mark.needham: in the format it seems like it's missing the `T` - I would expect it to be: ```"format": "1:HOURS:SIMPLE_DATE_FORMAT:yyyy/MM/dd''T''HH:mm:ss'Z'",```
@mark.needham: although I'm not sure how it would have been working before 1 day ago as that issue would have still existed
@francois: Format does not match data is using “yyyy-MM-dd’’T’‘HH:mm:ss’Z’” and schema is declared at “yyyy/MM/dd HH:mm:ss’Z’”
@shaileshjha061: Thanks Let me update the schema and check. Hope updating will not cause any prob. to data inside table.
@kstobbelaar: @kstobbelaar has joined the channel
@erik.bergsten: Hello!
@erik.bergsten: We have tagged one server as DefaultTenant_OFFLINE and one as DefaultTenant_a_OFFLINE (using the web gui) and then created a table with the following config: ```{ "tableName": "environment", "tableType": "OFFLINE", "tenants": { "broker": "DefaultTenant", "server": "DefaultTenant" }, "segmentsConfig": { "schemaName": "environment", "timeColumnName": "ts", "replication": "1", "replicasPerPartition": "1", "retentionTimeUnit": null, "retentionTimeValue": null, "segmentPushType": "APPEND", "segmentPushFrequency": "DAILY", "crypterClassName": null, "peerSegmentDownloadScheme": null }, "tableIndexConfig": { "loadMode": "MMAP", "invertedIndexColumns": [], "createInvertedIndexDuringSegmentGeneration": false, "rangeIndexColumns": [], "sortedColumn": [], "bloomFilterColumns": [], "bloomFilterConfigs": null, "noDictionaryColumns": [], "onHeapDictionaryColumns": [], "varLengthDictionaryColumns": [], "enableDefaultStarTree": false, "starTreeIndexConfigs": null, "enableDynamicStarTreeCreation": false, "segmentPartitionConfig": null, "columnMinMaxValueGeneratorMode": null, "nullHandlingEnabled": false }, "metadata": {}, "ingestionConfig": { "filterConfig": null, "transformConfigs": [ { "columnName": "ts", "transformFunction": "FromDateTime(\"DepartureDate\", 'yyyy-MM-dd''T''HH:mm:ss.SSSZ')" } ] }, "quota": { "storage": null, "maxQueriesPerSecond": null }, "task": null, "routing": { "segmentPrunerTypes": null, "instanceSelectorType": null }, "instanceAssignmentConfigMap": null, "query": { "timeoutMs": null }, "fieldConfigList": null, "upsertConfig": null, "tierConfigs": [ { "name": "tierA", "segmentSelectorType":"time", "segmentAge": "5m", "storageType": "pinot_server", "serverTag": "DefaultTenant_a_OFFLINE" } ] }``` We expected to see segments moved from DefaultTenant_OFFLINE to the other server 5 mins after ingestion (using batch ingestion) but nothing seems to happen. Is there anything obviously wrong in the config? How should we pursue solving this problem? We cannot find any errors / interesting messages in any log.
@npawar: the 5m in the tier config means that segments older than 5m will be moved, but not necessaritly 5m after igestion
@npawar: the movement is done by a periodic task on the controller, which runs every hour
@npawar: you can set this config on controller to be more aggressive if you want to see movement happen sooner: `controller.segment.relocator.frequencyPeriod=10m`
@npawar: Updated the Tier Config doc with this info:
@luisfernandez: question: when restarting servers that have been consuming data already, and say that when we restart the server we have to download data from an external deep store, is there any way to know when the server is ready to be queryable? i feel like a check to `health` endpoint wouldn’t suffice or would it. I’m asking cause we are making a change in production and I have notice that we do roll restarts with `kubectl` servers come back but they are trying to catch up the data that they are missing and what not so i’m trying to figure out a way to do this safely
@luisfernandez: another question, is it me or minion doesn’t have a `health` endpoint what do you check for livenessProbe there?
@grace.lu: Hi team, I have following issues and questions for pinot spark batch ingestion: 1. The ingestion job will fail when input parquet data contains timestamp type column, it seems to be related to INT96 timestamp type unsupported issue. Is the workaround here just preprocessing the data to cast it to other format? 2. Can the partition columns in the input path be understood in the ingestion job? eg: if I have
@mayanks: 1. Use native parquet reader
@mayanks: 2. @xiaobing
@grace.lu: I tried native parquet reader but it failed with same weird error in that old thread: ```Caused by: java.io.FileNotFoundException: File does not exist: /mnt/data/yarn/usercache/grace/appcache/application_1641717365497_8097/container_1641717365497_8097_01_000002/tmp/pinot-56f5f6ca-c206-4896-b51e-f490c8b04893/input/part-00070-f1c41c9a-8f47-4ef2-9d2b-376ad995e591.c000```
@xiaobing: for 2. I don’t think it’s feasible right now. potential workaround could be: 1) derive the dt value from a column still in the data file (
@grace.lu: yeah it’s a bit tricky for us because we don’t have a time column in the data file that carries the same information that we can derive from. I think we will need to further process the upstream data to prepare it for pinot ingestion. Thanks for looking! :pray:
@very312: hi team, i have a question on schema evolution on hybrid table. is it possible that we add new column on realtime table while this table is still in the form of hybrid table? (i.e. table a is already hybrid table, i want to add new column on table a's realtime table since past data in offline table does not have any feature of new column) if possible, do we just need to follow
@mayanks: Yes schema evolution is supported for hybrid tables
@mayanks: This includes consuming segments
#getting-started
@mariums82: @mariums82 has joined the channel
@ysuo: Hi team, if I customize a new plugin and want to deploy it in Kubernetes, what should I do if I don’t want to build a new image?
@ysuo: When I set retentionTimeValue to 1 day for a table, segments older than 1 day will be deleted and moved to Deleted Segments folder . Then how long will the data stay there? Is there a default duration?
@kaushalaggarwal349: I am also trying to understand same thing, I want to retain data for 6 months, any clearity on how to do this, would be appreciated!
@mark.needham: the default time in the deleted segments folder is 7 days
@mark.needham: for a 6 mth retention you can set the retention period to 6 mths in the segment config
@mark.needham: ``` "segmentsConfig": { "retentionTimeUnit": "DAYS", "retentionTimeValue": "60", },```
@kaushalaggarwal349: Okay thanks.
@ysuo: Can I change the default time in the deleted segments folder? @mark.needham
@mark.needham: I think it's hardcoded
@mark.needham: at least that's how it seems from my code reading
@mark.needham: maybe create a GitHub issue if you need it to be configurable -
@ysuo: I see. Thanks.
@npawar: there is a config. let me dig it up
@npawar: ```controller.deleted.segments.retentionInDays``` @ysuo
@ysuo: Thanks @npawar
@kaushalaggarwal349: which column is used for primary key in pinot?
@kstobbelaar: @kstobbelaar has joined the channel
#jobs
@aboagyemichaelk: @aboagyemichaelk has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org