syed72 opened a new issue #6864:
URL: https://github.com/apache/incubator-pinot/issues/6864


   Hello guys,Data Ingestion not working for data with JSON data types. No 
segments getting created
   Followed the below steps in StackOverflow.
   
   https://stackoverflow.com/questions/65886253/pinot-nested-json-ingestion
   
   Even examples given for JSON data types in build also not working 
(githubEvents)https://github.com/apache/incubator-pinot/tree/master/pinot-tools/src/main/resources/examples/batch/githubEventsSchema
 file:
   
   {
     "metricFieldSpecs": [],
     "dimensionFieldSpecs": [
       {
         "dataType": "STRING",
         "name": "name"
       },
       {
         "dataType": "LONG",
         "name": "age"
       },
       {
         "dataType": "STRING",
         "name": "subjects_str"
       },
       {
         "dataType": "STRING",
         "name": "subjects_name",
         "singleValueField": false
       },
       {
         "dataType": "STRING",
         "name": "subjects_grade",
         "singleValueField": false
       }
     ],
     "dateTimeFieldSpecs": [],
     "schemaName": "myTable"
   }
   
   Table Config:
   
   {
       "tableName": "myTable",
       "tableType": "OFFLINE",
       "segmentsConfig": {
           "segmentPushType": "APPEND",
           "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
           "schemaName": "myTable",
           "replication": "1"
       },
       "tenants": {},
       "tableIndexConfig": {
           "loadMode": "MMAP",
           "invertedIndexColumns": [],
           "noDictionaryColumns": [
               "subjects_str"
           ],
           "jsonIndexColumns": [
               "subjects_str"
           ]
       },
       "metadata": {
           "customConfigs": {}
       },
       "ingestionConfig": {
           "batchIngestionConfig": {
               "segmentIngestionType": "APPEND",
               "segmentIngestionFrequency": "DAILY",
               "batchConfigMaps": [],
               "segmentNameSpec": {},
               "pushSpec": {}
           },
           "transformConfigs": [
               {
                   "columnName": "subjects_str",
                   "transformFunction": "jsonFormat(subjects)"
               },
               {
                   "columnName": "subjects_name",
                   "transformFunction": "jsonPathArray(subjects, '$.[*].name')"
               },
               {
                   "columnName": "subjects_grade",
                   "transformFunction": "jsonPathArray(subjects, '$.[*].grade')"
               }
           ]
       }
   
}Data.json{"name":"Pete","age":24,"subjects":[{"name":"maths","grade":"A"},{"name":"maths","grade":"B--"}]}
   
{"name":"Pete1","age":23,"subjects":[{"name":"maths","grade":"A+"},{"name":"maths","grade":"B--"}]}
   
{"name":"Pete2","age":25,"subjects":[{"name":"maths","grade":"A++"},{"name":"maths","grade":"B--"}]}
   
{"name":"Pete3","age":26,"subjects":[{"name":"maths","grade":"A+++"},{"name":"maths","grade":"B--"}]}
   
   please help me to rectify this issue.Ingestion Job output: (No error)
   
   bin/pinot-admin.sh LaunchDataIngestionJob -jobSpecFile 
/home/sas/apache-pinot-incubating-0.7.1-bin/examples/batch/jsontype/ingestionJobSpec.yaml
 
   SegmentGenerationJobSpec: 
   !!org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
   cleanUpOutputDir: false
   excludeFileNamePattern: null
   executionFrameworkSpec: {extraConfigs: null, name: standalone, 
segmentGenerationJobRunnerClassName: 
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner,
     segmentMetadataPushJobRunnerClassName: null, 
segmentTarPushJobRunnerClassName: 
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner,
     segmentUriPushJobRunnerClassName: 
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner}
   includeFileNamePattern: glob:**/*.json
   inputDirURI: examples/batch/jsontype/rawdata
   jobType: SegmentCreationAndTarPush
   outputDirURI: examples/batch/jsontype/segments
   overwriteOutput: true
   pinotClusterSpecs:
   - {controllerURI: 'http://localhost:9000'}
   pinotFSSpecs:
   - {className: org.apache.pinot.spi.filesystem.LocalPinotFS, configs: null, 
scheme: file}
   pushJobSpec: {pushAttempts: 2, pushParallelism: 1, pushRetryIntervalMillis: 
1000,
     segmentUriPrefix: null, segmentUriSuffix: null}
   recordReaderSpec: {className: 
org.apache.pinot.plugin.inputformat.json.JSONRecordReader,
     configClassName: null, configs: null, dataFormat: json}
   segmentCreationJobParallelism: 0
   segmentNameGeneratorSpec: null
   tableSpec: {schemaURI: 'http://localhost:9000/tables/myTable/schema', 
tableConfigURI: 'http://localhost:9000/tables/myTable',
     tableName: myTable}
   tlsSpec: nullTrying to create instance for class 
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
   Creating an executor service with 1 threads(Job parallelism: 0, available 
cores: 40.)
   Initializing PinotFS for scheme file, classname 
org.apache.pinot.spi.filesystem.LocalPinotFS
   Submitting one Segment Generation Task for 
file:/home/sas/apache-pinot-incubating-0.7.1-bin/examples/batch/jsontype/rawdata/test.json
   Initialized FunctionRegistry with 119 functions: [fromepochminutesbucket, 
arrayunionint, codepoint, mod, sha256, year, yearofweek, upper, 
arraycontainsstring, arraydistinctstring, bytestohex, tojsonmapstr, trim, 
timezoneminute, sqrt, togeometry, normalize, fromepochdays, arraydistinctint, 
exp, jsonpathlong, yow, toepochhoursrounded, lower, toutf8, concat, ceil, 
todatetime, jsonpathstring, substr, dayofyear, contains, jsonpatharray, 
arrayindexofint, fromepochhoursbucket, arrayindexofstring, minus, 
arrayunionstring, toepochhours, toepochdaysrounded, millisecond, 
fromepochhours, arrayreversestring, dow, doy, min, toepochsecondsrounded, 
strpos, jsonpath, tosphericalgeography, fromepochsecondsbucket, max, reverse, 
hammingdistance, stpoint, abs, timezonehour, toepochseconds, arrayconcatint, 
quarter, md5, ln, toepochminutes, arraysortstring, replace, strrpos, 
jsonpathdouble, stastext, second, arraysortint, split, fromepochdaysbucket, 
lpad, day, toepochminutesrounded, fromdatetime, fromep
 ochseconds, arrayconcatstring, base64encode, ltrim, arraysliceint, chr, sha, 
plus, base64decode, month, arraycontainsint, toepochminutesbucket, startswith, 
week, jsonformat, sha512, arrayslicestring, fromepochminutes, remove, 
dayofmonth, times, hour, rpad, arrayremovestring, now, divide, 
bigdecimaltobytes, floor, toepochsecondsbucket, toepochdaysbucket, hextobytes, 
rtrim, length, toepochhoursbucket, bytestobigdecimal, toepochdays, 
arrayreverseint, datetrunc, minute, round, dayofweek, arrayremoveint, 
weekofyear] in 942ms
   Using class: org.apache.pinot.plugin.inputformat.json.JSONRecordReader to 
read segment, ignoring configured file format: AVRO
   Finished building StatsCollector!
   Collected stats for 4 documents
   Using fixed length dictionary for column: subjects_grade, size: 20
   Created dictionary for STRING column: subjects_grade with cardinality: 5, 
max length in bytes: 4, range: A to B--
   Using fixed length dictionary for column: subjects_name, size: 5
   Created dictionary for STRING column: subjects_name with cardinality: 1, max 
length in bytes: 5, range: maths to maths
   Using fixed length dictionary for column: name, size: 20
   Created dictionary for STRING column: name with cardinality: 4, max length 
in bytes: 5, range: Pete to Pete3
   Created dictionary for LONG column: age with cardinality: 4, range: 23 to 26
   Start building IndexCreator!
   Finished records indexing in IndexCreator!
   Trying to create instance for class 
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner
   Initializing PinotFS for scheme file, classname 
org.apache.pinot.spi.filesystem.LocalPinotFS
   Start pushing segments: []... to locations: 
[org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@4e31276e] for table 
myTable.
   
   Reference:
   https://apache-pinot.slack.com/archives/C011C9JHN7R/p1619532619119600


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to