[
https://issues.apache.org/jira/browse/SOLR-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303367#comment-15303367
]
Hoss Man commented on SOLR-5944:
--------------------------------
Ok ... more in depth comments reviewing the latest patch (ignoring some of the
general higher level stuff i've previously commented on).
(So far i've still focused on reviewing the tests, because we should make sure
they're rock solid before any discussion of refacoting/improving/changing the
code)
----
* in general, all these tests seem to depend on autoCommit being disabled, and
use a config that is setup that way, but don't actaully assert that it's true
in case someone changes the configs in the future
** TestInPlaceUpdate can get direct access to the SolrCore verify that for
certain to
** the distrib tests might be able to use one of hte new cnfig APIs to check
this (i don't know off the top of my head)
*** at a minimum define a String constant for the config file name in
TestInPlaceUpdate and refer to it in the other tests where the same config is
expected with a comment explaining that we're *assuming* it has autoCommit
disabled and that TestInPlaceUpdate will fail if it does not.
* TestInPlaceUpdate
** SuppressCodecs should be removed
** should at least have class level javadocs explaining what's being tested
** testUpdatingDocValues
*** for addAndGetVersion calls where we don't care about the returned version,
don't bother assigning it to a variable (distracting)
*** for addAndGetVersion calls where we do care about the returned version, we
need check it for every update to that doc...
**** currently version1 is compared to newVersion1 to assert that an update
incrememnted the version, but in between those 2 updates are 4 other places
where that document was updated -- we have to assert it has the expected value
(either the same as before, or new - and if new record it) after all of those
addAndGetVersion calls, or we can't be sure where/why/how a bug exists if that
existing comparison fails.
**** ideally we should be asserting the version of every doc when we query it
right along side the assertion for it's updated "ratings" value
*** most of the use of "field(ratings)" can probbaly just be replaced with
"ratings" now that DV are returnable -- although it's nice to have both
included in the test at least once to demo that both work, but when doing that
there should be a comment making it clear why
** testOnlyPartialUpdatesBetweenCommits
*** ditto comment about checking return value from addAndGetVersion
*** this also seems like a good place to to test if doing a redundent atomic
update (either via set to the same value or via inc=0) returns a new version or
not -- should it?
** DocInfo should be a private static class and have some javadocs
** based on how testing has gone so far, and the discover of LUCENE-7301 it
seems clear that adding even single thread, single node, randomized testing of
lots of diff types of add/update calls would be good to add
*** we could refactor/improve the "checkReplay" function I added in the last
patch to do more testing of a randomly generated Iterable of "commands"
(commit, doc, doc+atomic mutation, etc...)
*** and of course: improve checkReplay to verify RTG against hte uncommited
model as well
*** testReplayFromFile and getSDdoc should probably go away once we have more
structured tests for doing this
** createMap can be elimianted -- callers can just use SolrTestCaseJ4.map(...)
** In general the tests in this class should include more queries / sorting
against the updated docvalues field after commits to ensure that the updated
value is searchable & sortable
** Likewise the test methods in this class should probably have a lot more RTG
checks -- with filter queries that constrain against the updated docvalues
field, and checks of the expected version field -- to ensure that is all
working properly.
* InPlaceUpdateDistribTest
** SuppressCodecs should be removed
** should at least have class level javadocs explaining what's being tested
** Once LUCENE-7301 is fixed and we can demonstate that this passes reliably
all of the time, we should ideally refactor this to subclass SolrCloudTestCase
** in general, the "pick a random client" logic should be refactored so that
sometimes it randomly picks a CloudSolrClient
** there should almost certianly be some "delete all docs and optimize" cleanup
in between all of these tests
*** easy to do in an @Before method if we refactor to subclass SolrCloudTestCase
** docValuesUpdateTest
*** should randomize numdocs
*** we need to find away to eliminate the hardcoded "Thread.sleep(500);"
calls...
**** if initially no docs have a rating value, then make the (first) test query
be for {{rating:\[\* TO \*\]}} and execute it in a rety loop until the numFound
matches numDocs.
**** likewise if we ensure all ratings have a value such that abs(ratings) < X,
then the second update can use an increment such that abs(inc) > X\*3 and we
can use {{-ratings:\[-X TO X\]}} as the query in a retry loop
** ensureRtgWorksWithPartialUpdatesTest
*** even if we're only going to test one doc, we should ensure there are a
random num docs in the index (some before the doc we're editing, and some after)
*** if we're testing RTG, then we should be testing the version returned from
every /get call against the last version returned from every update
** outOfOrderUpdatesIndividualReplicaTest
*** ditto comments about only one doc
*** ditto comments about testing the expected version in RTG requests
*** if we're sending updates direct to replicas to test how they handle out of
order updates, then something better assert exactly where the leader is hosted
and ensure we don't send to it by mistake
*** what's the point of using a threadpool and SendUpdateToReplicaTask here?
why not just send the updates in a (randdomly assigned) determinisitc order?
**** if we are going to use an ExecutorService, then the result of
awaitTermination has to be checked
**** ... and shutdown & awaitTermination have to be called in that order
*** since this tests puts replicas out of sync, a "delete all docs and
optimize" followed up "wait for recovers" should happen at the end of this test
(or just in between every test) .. especially if we refactor it (or to protect
someone in the future who might refactor it)
** delayedReorderingFetchesMissingUpdateFromLeaderTest
*** ditto previous comments about using a threadpool and SendUpdateToReplicaTask
**** even more so considering the "Thread.sleep(100);" ... what's the pont of
using a threadpool if we want the requests to be sequential?
*** Is there no way we can programatically tell if LIR has kicked in? ...
pehaps by setting a ZK watch? ... this "Thread.sleep(500);" is no garuntee and
seens arbitrary.
**** at a minimum polling in a loop for the expected results seems better then
just a hardcoded sleep
** SendUpdateToReplicaTask
*** based on how it's used, i'm not really sure i see the point in this class,
but assuming it continues to exist...
*** constructor takes in a Random, but method uses that global {{random()}}
anyway.
**** should probably take in a seed, and construct it's own Random
**** {{random()}} ensures that each Thread gets it's own consistent Random
instance -- but in Callables like this each Thread having a consistent seed
doesn't help the reproducibility since there's no garutee which Threed from an
ThreadPool (Executor) will invoke call().
*** instead of returning true, this should be a Callable<UpdateResponse> and
call() should return the results of the request so the caller can assert it was
successful (via Future.get().getStatus())
** getReplicaValue
*** using SolrTestCaseJ4.params(...) would make this method a lot shorter
*** based on where/how this method is used, i don't understand why it returns
String instead of just Object
** assertReplicaValue
*** should take in some sort of assertion message and/or build/append an
assertion message using the clientId
*** if getReplicaValue returns an Object, this can take an "Object expected"
param and eliminate abonch of toString & string concating throughout the test
** simulatedUpdateRequest
*** if this method is going to assume that the only shard you ever want to
similulate an update to is SHARD1 then the method name should be
"simulatedUpdateRequestToShard1Replica"
*** better still - why not ask the DocRouter which shard this doc belongs in,
and fetch the leader URL tha way?
** most usage of "addFields" can just be replaced with a call to "sdoc(...)" to
simplify code
** replace createMap usage with SolrTestCaseJ4.map
** why override tearDown if we're just calling super?
** in general, i think this test would be a lot easier to read if there were
well named variables for HttpSolrClient instances pointed at specific replicas
(ie HttpSolrClient SHARD1_LEADER = ...; HttpSolrClient SHARD1_REPLICA1 = ...;
etc...) and passed those around to the various methods instead of magic ints
(ie: "1", "2") to refer to which index in the static clients list should be
used for a given update.
* TestStressInPlaceUpdates
** ditto comments from InPlaceUpdateDistribTest about regarding @SupressCodecs,
javadocs, and extending SolrCloudTestCase once LUCENE-7301 is fixed and we're
sure this test passes reliably.
*** also: we should really make this test use multiple shards
** stressTest
*** it would be a lot cleaner/clearer if we refactored these anonymous Thread
classes into private static final (inner) classes and instantiated them like
normal objects
**** makes it a lot easier to see what threads access/share what state
**** better still would be implementing these "workers" as Callable instances
and using an ExecutorService
*** "operations" comment is bogus (it's not just for queries)
*** maxConcurrentCommits WTF?
**** has a comment that it should be less the # warming threads, but does that
even make sense i na distrib test?
**** currently set to nWriteThreads -- so what's the point, when/how could we
ever possibly have more concurrent commits then the number of threads? doesn't
that just mean that at the moment every write thread is allowed to commit if it
wants to?
**** if there is a reason for it, then replaceing "numCommitting" with a {{new
Semaphore(maxConcurrentCommits)}} would probably make more sense
*** why is the "hardCommit start" logic calling both {{commit();}} and
{{clients.get(chosenClientIndex).commit();}} ?
*** I'm not convinced the "{{synchronize \{...\}; commit stuff; syncrhonize \{
... \};}}" sequence is actually thread safe...
**** T-W1: commit sync block 1: newCommittedModel = copy(model), version =
snapshotCount++;
**** T-W2: updates a doc and adds it to model
**** T-W1: commit
**** T-W1: commit sync block 2: committedModel = newCommittedModel
**** T-R3: read sync block: get info from committedModel
**** T-R3: query for doc
**** ...
*** ... in the above sequence, query results seen by thread T-R3 won't match
the model because the update from T-W2 made it into the index before the
commit, but after the model was copied
**** i guess it's not a huge problem because the query thread doesn't bother to
assert anything unless the versions match -- but that seems kind of risky ...
we could theoretically never assert anything
*** having at least one pass over the model checking every doc at the end of
the test seems like a good idea no matter what
*** I'm certain the existing "synchronized (model)" block is not thread safe
relative to the synchronized blocks that copy the model into commitedModel,
because the "model.put(...)" calls can change the iterator and trigger a
ConcurrentModificationException
*** there's a bunch of "TODO" blocks realted to deletes that still need
implemented
*** the writer threads should construct the SolrInputDocument themselves, and
log the whole document (not just the id) when they log things, so it's easier
to tell from the logs what updates succeed and which were rejected because of
version conflicts
*** why is {{//fail("Were were results: "+response);}} commented out?
*** there's a lot of "instanceof ArrayList" checks that make no sense to me
since the object came from getFirstValue
** DocInfo
*** should be a private static class and have some javadocs
*** or sould be a public class in it's own file, w/javadocs, and re-used in the
various tests that want ot reuse it
** verbose
*** why does this method exist? why aren't callers just using log.info(...)
directly?
*** or if callers really need to pass big sequences of stuff, they can use
{{log.info("\{\}", Arrays.asList(...))}}
*** or worst case: this method can simplified greatly to do that internally
** addDocAndGetVersion
*** using SolrTestCaseJ4.sdoc and SolrTestCaseJ4.params will make this method a
lot sorder
*** why are we synchronizing on cloudClient but updating with leaderClient?
**** if the point is just to ensure all udpates happem synchronously regardless
of client, then we should just define some {{public static final Object
UPDATE_SYNC = new Object("sync lock for updates");}} and use that
** getClientForLeader
*** i know this method is currently just a workaround for SOLR-8733, noting
that in the method javadocs seems important
*** if we refactor this test to use multiple shards before SOLR-8733 gets
resolved, this method can take in a uniqueKey, and consult the DocRouter to
pick the correct shard/node.
** replace createMap usage with SolrTestCaseJ4.map
** why override tearDown if we're just calling super?
* SolrDocument
** applyOlderUpdate
*** as mentioned before, i don't think this method is correct when it comes to
multivalued fields, and should have more unit tests of various permutations to
be sure
*** this functionality should probably be moved to a private helper method in
UpdateLog (it doesn't do anything that requires it have access to the internals
of the SolrDocument)
*** no matter where it lives, it should have some javadocs
> Support updates of numeric DocValues
> ------------------------------------
>
> Key: SOLR-5944
> URL: https://issues.apache.org/jira/browse/SOLR-5944
> Project: Solr
> Issue Type: New Feature
> Reporter: Ishan Chattopadhyaya
> Assignee: Shalin Shekhar Mangar
> Attachments: DUP.patch, SOLR-5944.patch, SOLR-5944.patch,
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
> TestStressInPlaceUpdates.eb044ac71.beast-167-failure.stdout.txt,
> TestStressInPlaceUpdates.eb044ac71.beast-587-failure.stdout.txt,
> TestStressInPlaceUpdates.eb044ac71.failures.tar.gz,
> hoss.62D328FA1DEA57FD.fail.txt, hoss.62D328FA1DEA57FD.fail2.txt,
> hoss.62D328FA1DEA57FD.fail3.txt, hoss.D768DD9443A98DC.fail.txt,
> hoss.D768DD9443A98DC.pass.txt
>
>
> LUCENE-5189 introduced support for updates to numeric docvalues. It would be
> really nice to have Solr support this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]