A while back a couple outside devs reached out to me asking advice on how to get started doing Solr development.
Basically, I generally just say to find some item to improve that has some synergy with your Solr use needs or pet peeves. I got the feeling that was not the best jump-starter for ideas though, and you usually have to come up with a couple on the spot, which is when I have fewest ideas. For those people and any others in a sim situation, here is a small list of ideas I have: - Add support for Jetty Quickstart https://webtide.com/jetty-9-quick-start/ - Upgrade Jetty to 10 or 11 - default offheap ByteBuffers for the client/server (this config went away in 9 and returns in 10/11) and improved HttpClient classes (old ones around but deprecated). Improved Quickstart support, the current dev momentum center. - Add the most basic Servlet Request async support, simply to run the request thread in our own threadpool, no async processing actually done. Then Solr’s pool can be the unlimited pool and Jetty can have a limited pool as it expects and works best with. - Add SolrQosFilter that extends Jetty’s QoSFilter. Allow allow internal requests to prevent deadlock, but use async Servlet Request feature via QoSFilter to suspend and prioritize un-susspend of external requests on overload. - Dig into removing @ThreadLeakLingering(linger = 10000) from SolrTestCase. - Add a custom Jetty LifeCycleListener that get’s notified of Shutdown before Jetty shutdown has already begun (the situation when we are currently notified by ServletFilter#destroy). Remove the live node entry there. - Look into current state of the static field checker for tests. Since the same JVM is used across many to potentially all of the tests, they should avoid leaving behind large static remnants in the test class. Is this check working? Are there offenders? - Look into the @BeforeClass @AfterClass shadowing method name test rule. This should prevent test class hierarchies from having the same name for these static methods because there can be hard to pin down bad issues when this happens. Look into it’s effectiveness and current violations. See NoClassHooksShadowingRule - Review for inefficient Collection / StringBuilder size init. Make sizing improvements where a lot of capacity grows are guaranteed and or likely or we already know the size. - Run the tests using IntelliJ’s allocate profiler, java flight recorder, YourKit with allocation recording on, etc and note the top allocated objects. Look at simple changes (say reuse or alternative implementations) for some of the largest outliers. - Change some ZooKeeper usages to use the much more efficient Async API. It’s performance is essentially that of a MultiOp, but without the requirement everything succeed/fail atomically. Bonus if you can change a path to run fully async, where step b only runs or consumes resources when zk async call A is finished. Most systems that provide async also provide reasonably good back pressure for free. Lots of wins in this area. - Look at making ZkClient calls that come in and hit ConnectionLoss simply wait until ZooKeeper has connected again rather than any retries or fails and repeated attempts. Verify that on ConnectionLoss, the system essentially goes quite to ZK until reconnection instead of ramping up trying to make something with ZK happen. - Investigate using the Lucene Segment Replicator replication strategy in PULL or TLOG replicas to take advantage of it’s NRT segment replication feature and awesome, isolated testing and Lucene integration. - Investigate combining or dropping calls to ZooKeeper. There are only so many types of calls that go to ZK. They are only usefully bringing back information at a pretty slow rate in computer time. Review calls that are hitting many times a second or just generally too much for the information trade happening. Updating some timely item once a second is perhaps reasonable. Trying to do something so fast the last call has not even completed, perhaps not. The system should not rely on cluster state being 100% up to date the huge majority of cases, which means rapid updates are likely never very sensible. - Make it dead simple to setup Solr to log to a JSON output format. - Look at introducing fair locking into the TransactionLog. When locks on a path are constantly gotten and released like happens there, especially with updates that can depend on each other, you can end up with old requests lock attempts often getting beat by new requests along the lock/unlock chain, and this can cause traffic mayhem. -- - Mark http://about.me/markrmiller