Druid Heavy Data Load perform only partial load

Kiran Jagtap Fri, 19 Oct 2018 04:03:32 -0700

Hi Team,
Thank you so much & appreciated your help & support.
I'm facing some issues to load heavy data into druid single node setup, data 
load job is successful, but only partial data gets loaded.


Machine config : Linux-ubuntu 16.04 LTS, 4 CPU, 16GB RAM, 500 GB disk space

Data csv file size : 290MB
Total rows : 1 million
Total columns : 21
Total dimensions : 21

Druid data ingestion config :

{
  "type" : "index",
  "spec" : {
    "ioConfig" : {
      "type" : "index",
      "firehose" : {
        "type" : "local",
        "baseDir" : "data/",
        "filter" : "0_1000000.csv"
      },
      "appendToExisting" : false
    },
    "dataSchema" : {
      "dataSource" : "data_1_million",
      "granularitySpec" : {
        "type" : "uniform",
        "segmentGranularity" : "day",
        "queryGranularity" : "day",
        "intervals" : ["2018-08-01/2018-10-19"],
        "rollup" : true
      },
      "parser" : {
        "type" : "string",
        "parseSpec" : {
          "format" : "csv",
                  "hasHeaderRow" : true,
          "dimensionsSpec" : {
            "dimensions" : [
                "col_1",
                                "col_2",
                                "col_3",
                                "col_4",
                                "col_5",
                                "col_6",
                                "col_7",
                                "col_8",
                                "col_9",
                                "col_10",
                                "col_11",
                                "col_12",
                                "col_13",
                                "col_14",
                                "col_15",
                                "col_16",
                                "col_17",
                                "date_col_18",
                                "date_col_19",
                                "col_20",
                                "col_21"
            ]
          },
          "timestampSpec": {
                    "column": "date_col_19"
          }
        }
      },
      "metricsSpec" : []
    },
    "tuningConfig" : {
      "type" : "index",
      "targetPartitionSize" : 50000000,
      "maxRowsInMemory" : 10000000,
      "forceExtendableShardSpecs" : true
    }
  }
}
Druid data ingestion job is successful, but only loads 0.7 million rows out of 
1 million rows
Highly appreciated your help & pointers to solve this issue.

Thanks & Sincerely,
Kiran Jagtap

"Legal Disclaimer: This electronic message and all contents contain information 
from Cybage Software Private Limited which may be privileged, confidential, or 
otherwise protected from disclosure. The information is intended to be for the 
addressee(s) only. If you are not an addressee, any disclosure, copy, 
distribution, or use of the contents of this message is strictly prohibited. If 
you have received this electronic message in error please notify the sender by 
reply e-mail to and destroy the original message and all copies. Cybage has 
taken every reasonable precaution to minimize the risk of malicious content in 
the mail, but is not liable for any damage you may sustain as a result of any 
malicious content in this e-mail. You should carry out your own malicious 
content checks before opening the e-mail or attachment." www.cybage.com

Druid Heavy Data Load perform only partial load

Reply via email to