sanjdow opened a new issue, #18606: URL: https://github.com/apache/druid/issues/18606
Running Druid on an Openshift cluster, and using the Druid Delta Lake extension(https://github.com/apache/druid/tree/master/extensions-contrib/druid-deltalake-extensions) to connect and load Delta tables. Facing the following issue, - error while loading with delta connector: only 1024 records of each constituent parquet file(each partition of the delta table) is loaded Also, there is an an error on the UI as soon as the load is over - ERROR: Request failed with status code 404. This may be unrelated to the issue with parquet data load ### Affected Version 33.0.0 ### Description - Running Druid on Openshift cluster and using Datainfra Druid operator version: 0.3.8 - Trying to load delta table from parquet, using MSQ load with the below query context, { "finalizeAggregations": false, "groupByEnableMultiValueUnnesting": false, "arrayIngestMode": "array", "maxNumTasks": 11, "externalDataSampleRows": 0, "taskStatusCheckPeriodMs": 5000, "sqlInsertTaskNumSlots": "max" } - Query used for the load, REPLACE INTO "table" OVERWRITE ALL WITH "ext" AS ( SELECT * FROM TABLE( EXTERN( '{"type":"delta","tablePath":"path"}', '{"type":"parquet"}' ) ) EXTEND ("col1" VARCHAR, "col2" VARCHAR, "col3" VARCHAR, "col4" BIGINT, "col5" VARCHAR, "col6" VARCHAR, "col7" BIGINT, "col8" VARCHAR, "col9" VARCHAR) ) SELECT MILLIS_TO_TIMESTAMP("dop" * 1000) AS "__time", "col1", "col2", "col3", "col4", "col5", "col6", "col7", "col8" FROM "ext" PARTITIONED BY DAY -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
