jack86596 commented on a change in pull request #4105:
URL: https://github.com/apache/carbondata/pull/4105#discussion_r595406480
##########
File path:
index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestIndexRepair.scala
##########
@@ -119,6 +119,19 @@ class TestIndexRepair extends QueryTest with
BeforeAndAfterAll {
sql("drop table if exists maintable")
}
+ test("reindex command with stale files") {
+ sql("drop table if exists maintable")
+ sql("CREATE TABLE maintable(a INT, b STRING, c STRING) stored as
carbondata")
+ sql("CREATE INDEX indextable1 on table maintable(c) as 'carbondata'")
+ sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+ sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+ sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+ sql("DELETE FROM TABLE INDEXTABLE1 WHERE SEGMENT.ID IN(0,1,2)")
+ sql("REINDEX INDEX TABLE indextable1 ON MAINTABLE WHERE SEGMENT.ID IN
(0,1)")
Review comment:
1. "we shouldn't allow delete segments on index table itself." please
refer to the second last comment right before your comment. If you ever solve
production issue, you could not say this. There are thousand of query failed
issues just because of SI segment is broken. We need to first delete the broken
SI segment then repair it again(last two to three years, countless issues
because of SI segment broken or not sync with main table). So please get to
know customer, not build software without any knowing about how customer use
the software. And please during coding, stand also at maintainer side,
implement the feature with more maintainability. Thanks.
2. "And during repair index, if have segment with partial data, we should
delete the segment completely(segment folder, segment file, probably
tablestatus entry for the segment as well) before proceeding with segment
repair." You suggestion of course is right but too more complicate comparing to
existing implementation, so please first do the complete analysis and design
and then we can discuss and plan next step.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]