ajantha-bhat commented on a change in pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#discussion_r526651940
##########
File path:
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
##########
@@ -572,4 +574,18 @@ public ReadCommittedScope getReadCommitted(JobContext job,
AbsoluteTableIdentifi
public void setReadCommittedScope(ReadCommittedScope readCommittedScope) {
this.readCommittedScope = readCommittedScope;
}
+
+ public String getSegmentIdFromFilePath(String filePath) {
Review comment:
Already `CarbonTablePath#getSegmentIdFromPath` method exist. check if
you can use it.
##########
File path:
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/addsegment/AddSegmentTestCase.scala
##########
@@ -781,6 +781,26 @@ class AddSegmentTestCase extends QueryTest with
BeforeAndAfterAll {
sql(s"drop table $tableName")
}
+ test("Test add segment by carbon written by sdk having old timestamp") {
+ sql(s"drop table if exists external_primitive")
+ sql(
+ s"""
+ |create table external_primitive (id int, name string, rank smallint,
salary double,
+ | active boolean, dob date, doj timestamp, city string, dept string)
stored as carbondata
+ |""".stripMargin)
+ val externalSegmentPathWithOldTimestamp = storeLocation + "/" +
+ "external_segment_with_old_timestamp"
+ val externalSegmentPath = storeLocation + "/" + "external_segment"
+ FileFactory.deleteAllFilesOfDir(new File(externalSegmentPath))
+ copy(externalSegmentPathWithOldTimestamp, externalSegmentPath)
Review comment:
We should not keep binary files in the repo (like carbondata and index
files). It has to generate on the fly.
Try to write from SDK and use nano time timeStamp by calling
`CarbonWriterBuilder#uniqueIdentifier`
##########
File path:
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
##########
@@ -367,8 +368,9 @@ protected FileSplit makeSplit(String segmentId, String
filePath, long start, lon
String[] deleteDeltaFilePath = null;
if (isIUDTable) {
// In case IUD is not performed in this table avoid searching for
- // invalidated blocks.
- if (CarbonUtil
+ // invalidated blocks. No need to check validation for splits written
by SDK.
+ String segmentId = getSegmentIdFromFilePath(inputSplit.getFilePath());
Review comment:
add more comments explaining SDK case segment id will always be null and
using that as a decision point to skip this validation for SDK and why you need
to skip it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]