#general
@otiennosharon: @otiennosharon has joined the channel
@msoni6226: Hi Team, I, along with my team mate @bajpai.arpita746462 and @vibhor.jain we have documented our understanding around how to handle duplicate data in Pinot in this
@msoni6226: Hi Team, I was going through the Pinot Upsert flow documentation and youtube video(
@yupeng: that's a cool blog. thanks for sharing. for your questions, you can read the design doc on the whys
@msoni6226: Thanks Yupeng for pointing out this document. It has all the answers to my questions
@karinwolok1: :speaker: DeveloperWeek is looking for speakers!!! Interested in presenting about what you're doing with Pinot? Submit here:
@karinwolok1: DevNexus also looking for speakers! Let's get you all in the developer speaker circuit! :smile:
@karinwolok1: SIGMOD is another one! All about databases. Get your talk proposals submitted!
@roland.vink: @roland.vink has joined the channel
@rohan.a.suri: @rohan.a.suri has joined the channel
#random
@otiennosharon: @otiennosharon has joined the channel
@roland.vink: @roland.vink has joined the channel
@rohan.a.suri: @rohan.a.suri has joined the channel
#feat-compound-types
@otiennosharon: @otiennosharon has joined the channel
#apa-16824
@otiennosharon: @otiennosharon has joined the channel
#troubleshooting
@otiennosharon: @otiennosharon has joined the channel
@zsolt: We are using the offlineSegmentDelayHours metric for monitoring if the RealtimeToOffline task is stuck, and since upgrading to 0.8.0 we see stale values for it. Prior to 0.8.0 the metrics were present only on one controller, but now they can be on multiple controllers. I've found that 0.8.0 enables Controller Resource by default, so the tables can have different controllers as leaders. We couldn't find a metric to decide which controller is the leader for a table, so we can't filter out the stale metric for alerts. IMO these metrics should be removed for the table once leadership is lost, or there should be a gauge which can be used to decide if a controller is leader for a table.
@g.kishore: Good point. @jlli any thoughts on this?
@jlli: Yeah, that sounds fair. Let me try to make the change
@jlli: @zsolt the `PinotLeadControllerRestletResource` specifies the APIs to check the leadership of pinot tables. Please take a look
@g.kishore: @jlli that may not help. I think we should have a solution around not emitting metrics for a table if the controller is not the leader for that table
@g.kishore: otherwise, monitoring and alerting will hard
@g.kishore: @zsolt what tool are you using for monitoring? will adding up the metrics across all controllers help?
@zsolt: We are using the JMX metrics with prometheus agent
@jlli: @g.kishore That’s also true. We should also clean up metrics if the current controller is not the leader for that table. Since controller periodic tasks are run periodically, we can do the cleanup there
@mayanks: Thanks @jlli, mind filing an issue to capture the problem and track progress?
@jlli: Sure, will file an issue
@zsolt: Thanks!
@jlli: Here it is:
@jlli: This is the PR for the issue above:
@deemish2: Hello , we are using spark batch ingestion job to push data into pinot offline table using pinot-0.8.0 , we are getting this kind of exception - Caused by: groovy.lang.MissingPropertyException: No such property: date for class: SimpleTemplateScript1\n\tat org.codehaus.groovy.runtime.ScriptBytecodeAdapter.unwrap(ScriptBytecodeAdapter.java:66)
@xiangfu0: i think your ingestion job spec has groovy templates but propeties are not set
@roland.vink: @roland.vink has joined the channel
@rohan.a.suri: @rohan.a.suri has joined the channel
#feat-geo-spatial-index
@otiennosharon: @otiennosharon has joined the channel
@kchavda: Does anyone have a working example of using anything other than the stPoint & toSphericalGeography functions?
@yupeng: you can read this blog
@kchavda: Thanks for sharing @yupeng.
@kchavda: This query bombs on me from query console ```select toGeometry(base64Decode('AQEAACDmEAAAT0wojs27YsA0a4TZX5hOQA==')) from meetupRsvp limit 10``` Am I using the function correctly here? ```org.apache.pinot.sql.parsers.SqlCompilationException: Caught exception while invoking method: public static byte[] org.apache.pinot.core.geospatial.transform.function.ScalarFunctions.toGeometry(byte[]) with arguments: [[B@f7f6884]```
#docs
@otiennosharon: @otiennosharon has joined the channel
#aggregators
@otiennosharon: @otiennosharon has joined the channel
#dhill-date-seg
@otiennosharon: @otiennosharon has joined the channel
#enable-generic-offsets
@otiennosharon: @otiennosharon has joined the channel
#community
@otiennosharon: @otiennosharon has joined the channel
#announcements
@otiennosharon: @otiennosharon has joined the channel
#discuss-validation
@otiennosharon: @otiennosharon has joined the channel
#config-tuner
@otiennosharon: @otiennosharon has joined the channel
#getting-started
@sirsh: Im running my first batch ingestion job ingestion from S3 parquet files - the task was kicked off and the 8 rows of the input sample are read but then it fails and im not sure what the error message is telling me ... what is the illegal argument in this context? I did not get any closer looking at the source for Segment Name Generator... ```RecordReader initialized will read a total of 8 records. at row 0. reading next block block read in memory in 1 ms. row count = 8 Start building IndexCreator! Finished records indexing in IndexCreator! Failed to generate Pinot segment for file -
@sirsh: ``` executionFrameworkSpec: name: 'standalone' segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner' segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner' segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner' segmentMetadataPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentMetadataPushJobRunner' jobType: SegmentCreationAndUriPush inputDirURI: 's3://...' includeFileNamePattern: 'glob:**/*.parquet' outputDirURI: 's3://...' overwriteOutput: true pinotFSSpecs: - scheme: s3 className: org.apache.pinot.plugin.filesystem.S3PinotFS configs: region: 'us-east-1' recordReaderSpec: dataFormat: 'parquet' className: 'org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader' tableSpec: tableName: 'MY_TABLE' schemaURI: '
@sirsh: OK - im missing a `segmentNameGeneratorSpec` I realize its helpful to scan through the logs above the error and observe where some parameters are null and sometimes it matters!
@roland.vink: @roland.vink has joined the channel
@sirsh: When i query presto when there is a column with a reserved keyword like `timestamp` even though the spec for presto suggests that it can be escaped with double quotes, i cannot seem to submit a query that includes `"timestamp"` It might be specific to the clients I am using; i have tried the presto-cli freshly downloaded and a python client and both result in a PQLParsingError. What to do in this situation? (this is testing the presto-pinot connector but maybe not a Pinot question for this channel)
@xiangfu0: right, the generated pinot query also need to be escaped, but that is not set in presto pinot connector
@xiangfu0: so it requires a fix for that
@sirsh: thanks for the context @xiangfu0 :pray:
#feat-partial-upsert
@otiennosharon: @otiennosharon has joined the channel
#debug_upsert
@otiennosharon: @otiennosharon has joined the channel
#complex-type-support
@otiennosharon: @otiennosharon has joined the channel
#kinesis_help
@abhijeet.kushe: It actually does work as the moment it consumes the latest message the iterator age drops to 0.I am not aware how the iterage age is supposed to reflect when the shard iterator is AT_SEQUENCE_NUMBER.
@abhijeet.kushe:
@npawar: cool
@npawar: created issue for this
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org