Apache Pinot Daily Email Digest (2021-05-31)

Pinot Slack Email Digest Mon, 31 May 2021 19:00:55 -0700

#general

@patidar.rahul8392: Hello Everyone, I am creating one hybrid which can ingest data from Kafka topic(streaming data) as well as from hdfs Location(batch ingestion). I am aware about stream ingestion process to ingest data from Kafka topic and have created multiple realtime tables. Now I am creating one hybrid table for one of the Kafka topic data is also available at hdfs location for the same topic. I am going through the documents but in offline-config-table.json I couldn't find any properties where we are passing source location as hdfs location.Kindly suggest what is the process to ingest from hdfs also in same table.
@fx19880617: it’s in server config, for server conf, you should already have it
@patidar.rahul8392: In server.conf file we have 2 locations pinot.server.instancr.datadir Pinot.server.instance.segmentTarDir and both are local path which I have given while creating configuration file for deepstorage is this same location you are taking about? @fx19880617
@patidar.rahul8392: When I am checking on git I am able to see multiple files here i.e. Hadoop ingestion.yaml, schemaFile, configfile etc.
@patidar.rahul8392: So not sure which all files are required for creating hybrid table.
@fx19880617: right it’s local path
@fx19880617: pinot queries are serving from the servers which have local segments
@patidar.rahul8392: Oh okay @fx19880617 so for loading Pinot hybrid table I need to load data from pinot.server.instance.datadir location ryt? And which all files are required
@fx19880617: Yes
@fx19880617: You need to push segments to offline table, which all the segments files are backed up in deep store and also loaded by every server
@patidar.rahul8392: These files are available
@patidar.rahul8392: Which all configuration files needs to create @fx19880617 ? .I am able to see these 5 files on git.
@fx19880617: for batch job, you need to run ingestion job to push data to pinot offline table
@fx19880617:
@patidar.rahul8392: In offline-table-config.json I could find only these details.
@prabha.cloud: @prabha.cloud has joined the channel
@szecsip94: @szecsip94 has joined the channel
@rohit.agarwalla: @rohit.agarwalla has joined the channel
@savio.teles: @savio.teles has joined the channel
@savio.teles: Hi! I have two dimensions (customers and sellers) with a fact table with order data. We would like to aggregate the order data by customers and sellers, such as aggregate order amount. We would like to use the Star-tree index, but, the customer can change at any time (name, address, etc) and in the Pinot documentation it says that it does not accept upsert using Star-tree index (). What would be the best solution using Pinot?
@mayanks: Have you tried using without star tree? From your description, it seems that regular indexing should work just fine?
@savio.teles: Thanks, we will give it a try. One other question is whether we should pre-aggregate the data or if it makes sense to do the join between clients and sellers during the query using Presto. Because the client data can change any time (such as client address).
@g.kishore: Try lookup join feature in Pinot
@savio.teles: tks, @g.kishore.

#random

@prabha.cloud: @prabha.cloud has joined the channel
@szecsip94: @szecsip94 has joined the channel
@rohit.agarwalla: @rohit.agarwalla has joined the channel
@savio.teles: @savio.teles has joined the channel

#troubleshooting

@prabha.cloud: @prabha.cloud has joined the channel
@pedro.cls93: Hello, When defining a date time field from a string as: ```dateTimeFieldSpecs[{ "name": "dateOfBirth", "dataType": "STRING", "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HH:mm:ss'Z'", "granularity": "1:DAYS" },...,]``` Should you be able to apply at query time? For example, retrieving the year of the field: `select year("dateOfBirth") from ....` I'm getting parsing errors: ```2021/05/31 12:52:50.536 ERROR [BaseCombineOperator] [pqw-1] Caught exception while executing operator of index: 0 (query: QueryContext{_tableName='HitExecutionView_REALTIME', _selectExpressions=[year(dateOfBirth)], _aliasList=[null], _filter=null, _groupByExpressions=null, _havingFilter=null, _orderByExpressions=null, _limit=10, _offset=0, _queryOptions={responseFormat=sql, groupByMode=sql, timeoutMs=9994}, _debugOptions=null, _brokerRequest=BrokerRequest(querySource:QuerySource(tableName:HitExecutionView_REALTIME), pinotQuery:PinotQuery(dataSource:DataSource(tableName:HitExecutionView_REALTIME), selectList:[_expression_(type:FUNCTION, functionCall:Function(operator:YEAR, operands:[_expression_(type:IDENTIFIER, identifier:Identifier(name:dateOfBirth))]))], orderByList:[], limit:10, queryOptions:{responseFormat=sql, groupByMode=sql, timeoutMs=9994}))}) java.lang.NumberFormatException: For input string: "1997-02-06T00:00:00Z" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ~[?:1.8.0_292] at java.lang.Long.parseLong(Long.java:589) ~[?:1.8.0_292] at java.lang.Long.parseLong(Long.java:631) ~[?:1.8.0_292]```
@mayanks: This is because year expects a long and the data is stored in string format
@pedro.cls93: I thought the field was converted from the string representation during ingestion to an internal date time representation that could then be use any date time function
@pedro.cls93: To use those transformations should I perform some transformation to the field first and then store the field as a long (ms since epoch) ?
@mayanks: Oh, in that case can you try date time convert with granularity as year, in the query?
@pedro.cls93: I don’t understand, can you clarify?
@mayanks: In query, instead of year use date time convert
@pedro.cls93: Is there any way to specify a field as a date time field that is presented to the user as a human readeable string while still being able to use the date time transformations?
@pedro.cls93: I.e: Shown to the user in the format: yyyy-MM-dd on which they can do `year(field)` or `month(field)`
@pedro.cls93: Where a query like: `Select year(dateBirth), dateBirth from table where...` outputs: `1982 | 1982-02-13`
@mayanks: Not that I am aware of. Seems like a good enhancement, want file an issue?
@pedro.cls93: Sure
@pedro.cls93:
@szecsip94: @szecsip94 has joined the channel
@rohit.agarwalla: @rohit.agarwalla has joined the channel
@savio.teles: @savio.teles has joined the channel

#getting-started

@rohit.agarwalla: @rohit.agarwalla has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org

Apache Pinot Daily Email Digest (2021-05-31)

#general

#random

#troubleshooting

#getting-started

Reply via email to