zacharymorn commented on a change in pull request #128: URL: https://github.com/apache/lucene/pull/128#discussion_r628697141
########## File path: lucene/core/src/java/org/apache/lucene/index/CheckIndex.java ########## @@ -488,8 +503,35 @@ public Status checkIndex() throws IOException { * quite a long time to run. */ public Status checkIndex(List<String> onlySegments) throws IOException { Review comment: > please don't overload it here. force callers to pass executor. I took a quick look at these methods' current usage. `checkIndex()` has 4 references in tests and 1 in luke module, and `checkIndex(onlySegments)` has 6 references in tests and 1 in `CheckIndex` main, so it should be straightforward to remove these convenience methods and have all callers to invoke `checkIndex(onlySegments, executorService)`. Having said that, it does seems to be a burden to put on each caller to handle executor creation and shutdown each time this method gets called though. Maybe we could keep these methods and use the 1 default threadcount executor like you suggested if one is not explicitly passed in (I also updated the default threadCount to be 1 in my latest commit) ? > I am still worried that tests are using multiple threads here which must not happen: on an 8 core machine we don't want to use 8 jvms * 8 threads each. Oh I didn't realize it would spawn multiple jvm processes, or are you suggesting that if this gets run in multiple jvm processes for some reasons then the overall overhead will be high (which I agree!) ? Shall we do something like `cpu / constant` to reduce it a bit, or any other mechanism we can use to better control the number of threads based on whole system's overhead? > I am also concerned about newly-created synchronization issues here (e.g. with output). If checkindex fails it is really important that we can read this output. Yeah definitely agree with this. For the output messages synchronization issue, I put in some nocommit comment earlier and proposed to also include segment / part id in these messages as a cheap way to allow manual sorting / grep-ing after the messages were written, to essentially work around this issue. Not sure if this will be good enough? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org