[GitHub] carbondata pull request #2513: [CARBONDATA-2748] blocking concurrent load if...

gvramana Tue, 17 Jul 2018 04:19:29 -0700

Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2513#discussion_r202983275
  
    --- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala
 ---
    @@ -253,14 +253,16 @@ case class CarbonLoadDataCommand(
             }
             // First system has to partition the data first and then call the 
load data
             LOGGER.info(s"Initiating Direct Load for the Table : 
($dbName.$tableName)")
    -        // Clean up the old invalid segment data before creating a new 
entry for new load.
    -        SegmentStatusManager.deleteLoadsAndUpdateMetadata(table, false, 
currPartitions)
    -        // add the start entry for the new load in the table status file
    -        if (updateModel.isEmpty && !table.isHivePartitionTable) {
    -          CarbonLoaderUtil.readAndUpdateLoadProgressInTableMeta(
    -            carbonLoadModel,
    -            isOverwriteTable)
    -          isUpdateTableStatusRequired = true
    +        CarbonLoadDataCommand.synchronized {
    --- End diff --
    
    Checking segment status file for identifying parallel loading cannot work, 
as it could not distinguish if loading job is running or killed. So Only way to 
identify a parallel loading case is using lock.
    First identify the table with dictionary column(not direct dictionary) as 
table that cannot support parallel load.
    Then for those tables data loading should acquire lock.

---

[GitHub] carbondata pull request #2513: [CARBONDATA-2748] blocking concurrent load if...

Reply via email to