Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2977#discussion_r239344495
--- Diff:
integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/AggregateDataMapCompactor.scala
---
@@ -79,9 +79,20 @@ class AggregateDataMapCompactor(carbonLoadModel:
CarbonLoadModel,
CarbonSession.threadSet(CarbonCommonConstants.SUPPORT_DIRECT_QUERY_ON_DATAMAP,
"true")
loadCommand.processData(sqlContext.sparkSession)
- val newLoadMetaDataDetails = SegmentStatusManager.readLoadMetadata(
+ val oldMetadataDetails = SegmentStatusManager.readLoadMetadata(
+ carbonTable.getMetadataPath, "")
+ val newMetadataDetails = SegmentStatusManager.readLoadMetadata(
carbonTable.getMetadataPath, uuid)
- val updatedLoadMetaDataDetails = newLoadMetaDataDetails collect {
+ val mergedContent = oldMetadataDetails.collect {
+ case content =>
+ val contentIndex = newMetadataDetails.indexOf(content)
--- End diff --
How can you make sure the segment data from `newMetadataDetails` always
latest in case of concurrent scenario.
I feel you should take only the segment need to be updated from
`newMetadataDetails`
---