zacharymorn commented on a change in pull request #128:
URL: https://github.com/apache/lucene/pull/128#discussion_r628697141



##########
File path: lucene/core/src/java/org/apache/lucene/index/CheckIndex.java
##########
@@ -488,8 +503,35 @@ public Status checkIndex() throws IOException {
    *     quite a long time to run.
    */
   public Status checkIndex(List<String> onlySegments) throws IOException {

Review comment:
       > please don't overload it here. force callers to pass executor.
   
   I took a quick look at these methods' current usage. `checkIndex()` has 4 
references in tests and 1 in luke module, and `checkIndex(onlySegments)` has 6 
references in tests and 1 in `CheckIndex` main, so it should be straightforward 
to remove these convenience methods and have all callers to invoke 
`checkIndex(onlySegments, executorService)`. Having said that, it does seems to 
be a burden to put on each caller to handle executor creation and shutdown each 
time this method gets called though. Maybe we could keep these methods and use 
the 1 default threadcount executor like you suggested if one is not explicitly 
passed in (I also updated the default threadCount to be 1 in my latest commit) ?
   
   > I am still worried that tests are using multiple threads here which must 
not happen: on an 8 core machine we don't want to use 8 jvms * 8 threads each.
   
   Oh I didn't realize it would spawn multiple jvm processes, or are you 
suggesting that if this gets run in multiple jvm processes for some reasons 
then the overall overhead will be high (which I agree!) ? Shall we do something 
like `cpu / constant` to reduce it a bit, or any other mechanism we can use to 
better control the number of threads based on whole system's overhead?
   
   > I am also concerned about newly-created synchronization issues here (e.g. 
with output). If checkindex fails it is really important that we can read this 
output.
   
   Yeah definitely agree with this. For the output messages synchronization 
issue, I put in some nocommit comment earlier and proposed to also include 
segment / part id in these messages as a cheap way to allow manual sorting / 
grep-ing after the messages were written, to essentially work around this 
issue. Not sure if this will be good enough? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to