Re: Solr with encrypted HDFS

2019-09-11 Thread Hendrik Haddorp

Hi,

we have some setups that use an encryption zone in HDFS. Once you have
the hdfs config setup the rest is transparent to the client and thus
Solr works just fine like that. Said that, we have some general issues
with Solr and HDFS. The main problem seems to be around the transaction
log files. We have a quite high commit rate and these short lived files
don't seem to play well with HDFS and triple replication of the blocks
in HDFS. But encryption did not add anything issues for us.

regards,
Hendrik

On 11.09.19 22:53, John Thorhauer wrote:

Hi,

I am interested in encrypting/protecting my solr indices.  I am wondering
if Solr can work the an encrypted HDFS.  I see that these instructions (
https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/configuring-hdfs-encryption/content/configuring_and_using_hdfs_data_at_rest_encryption.html)
explain that:

"After permissions are set, Java API clients and HDFS applications with
sufficient HDFS and Ranger KMS access privileges can write and read to/from
files in the encryption zone"


So I am wondering if the solr/java api that uses HDFS would work with this
as well and also, has anyone had experience running this?  Either good
or bad?

Thanks,
John





Re: Question: Solr perform well with thousands of replicas?

2019-08-28 Thread Hendrik Haddorp

Hi,

we are usually using Solr Clouds with 5 nodes and up to 2000 collections
and a replication factor of 2. So we have close to 1000 cores per node.
That is on Solr 7.6 but I believe 7.3 worked as well. We tuned a few
caches down to a minimum as otherwise the memory usage goes up a lot.
The Solr UI is having some problems with a high number of collections,
like lots of timeouts when loading the status.

Older Solr versions had problem with the overseer queue in ZooKeeper. If
you restarted too many nodes at once then the queue got too long and
Solr died and required some help and cleanup to start at all again.

regards,
Hendrik

On 29.08.19 05:27, Hongxu Ma wrote:

Hi
I have a solr-cloud cluster, but it's unstable when collection number is big: 
1000 replica/core per solr node.

To solve this issue, I have read the performance guide:
https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems

I noted there is a sentence on solr-cloud section:
"Recent Solr versions perform well with thousands of replicas."

I want to know does it mean a single solr node can handle thousands of 
replicas? or a solr cluster can (if so, what's the size of the cluster?)

My solr version is 7.3.1 and 6.6.2 (looks they are the same in performance)

Thanks for you help.






NullPointerException in QueryComponent.unmarshalSortValues

2019-06-07 Thread Hendrik Haddorp

Hi,

I'm doing a simple *:* search on an empty multi sharded collection using
Solr 7.6 and am getting this exception:

NullPointerException
    at
org.apache.solr.handler.component.QueryComponent.unmarshalSortValues(QueryComponent.java:1034)
    at
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:885)
    at
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:585)
    at
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:564)
    at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:426)
    at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:2541)

This is the same exception as reported in
https://issues.apache.org/jira/browse/SOLR-12060 and likely also
https://issues.apache.org/jira/browse/SOLR-11643. Sometimes I can do
multiple requests and some work and some pass. And this test is done on
a single node that for testing uses a collection with four shards.

I looked a bit into the code:
https://github.com/apache/lucene-solr/blob/branch_7_6/solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java
I seems like
https://github.com/apache/lucene-solr/blob/branch_7_6/solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java#L884
return null.
This should mean that
https://github.com/apache/lucene-solr/blob/branch_7_6/solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java#L482
did not get invoked, which would happen if FIELD_SORT_VALUES is not set
to true. And indeed if I add fsv=true to my query the NPE does not show
up. So there are a few questions:
1) what is fsv=true about?
2) why do I need to set it?
3) why don't I get the NPE all the time?

Earlier it looked as if the problem only showed up if I enabled the
suggester or spellcheck component. But after having done tons of tests
things are not that consistent.

thanks,
Hendrik


Re: Status of solR / HDFS-v3 compatibility

2019-05-03 Thread Hendrik Haddorp

We have some Solr 7.6 setups connecting to HDFS 3 clusters. So far that
did not show any compatibility problems.

On 02.05.19 15:37, Kevin Risden wrote:

For Apache Solr 7.x or older yes - Apache Hadoop 2.x was the dependency.
Apache Solr 8.0+ has Hadoop 3 compatibility with SOLR-9515. I did some
testing to make sure that Solr 8.0 worked on Hadoop 2 as well as Hadoop 3,
but the libraries are Hadoop 3.

The reference guide for 8.0+ hasn't been released yet, but also don't think
it was updated.

Kevin Risden


On Thu, May 2, 2019 at 9:32 AM Nicolas Paris 
wrote:


Hi

solr doc [1] says it's only compatible with hdfs 2.x
is that true ?


[1]: http://lucene.apache.org/solr/guide/7_7/running-solr-on-hdfs.html

--
nicolas





Re: NPE deleting expired docs (SOLR-13281)

2019-03-13 Thread Hendrik Haddorp

We have the same issue on Solr 7.6.

On 12.03.2019 16:05, Gerald Bonfiglio wrote:

Has anyone else observed NPEs attempting to have expired docs removed?  I'm 
seeing the following exceptions:

2019-02-28 04:06:34.849 ERROR (autoExpireDocs-30-thread-1) [ ] 
o.a.s.u.p.DocExpirationUpdateProcessorFactory Runtime error in periodic 
deletion of expired docs: null
java.lang.NullPointerException: null
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.handleReplicationFactor(DistributedUpdateProcessor.java:992)
 ~[solr-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi 
- 2019-02-04 23:23:46]
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:960)
 ~[solr-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi 
- 2019-02-04 23:23:46]

Seems all that's required to reproduce it is to include 
DocExpirationUpdateProcessorFactory in an updateRequestProcessorChain.

More details can be found at: 
https://issues.apache.org/jira/projects/SOLR/issues/SOLR-13281





[Nastel  Technologies]

The information contained in this e-mail and in any attachment is confidential 
and
is intended solely for the use of the individual or entity to which it is 
addressed.
Access, copying, disclosure or use of such information by anyone else is 
unauthorized.
If you are not the intended recipient, please delete the e-mail and refrain 
from use of such information.





Re: Increasing solr nodes

2019-02-12 Thread Hendrik Haddorp
You can use the MOVEREPLICA command: 
https://lucene.apache.org/solr/guide/7_6/collections-api.html
Alternately you can also add another replica and then remove one of your 
old replicas.
When you a replica you can either specify the node it shall be placed on 
or let Solr pick a node for you.


On 12.02.2019 09:15, neerajbhatt wrote:

Hi

We have a solr cluster of 3 machines , A collection has three shards and 2
replicas so total of 9. Right now each machine has one shard leader and 2
replicas
Because of index size we need to increase the cluster to 9 ,What is the best
possible way to move a shard leader or replica to a new node





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html




Re: COLLECTION CREATE and CLUSTERSTATUS changes in SOLR 7.5.0

2019-02-10 Thread Hendrik Haddorp
Do you have something about legacyCloud in your CLUSTERSTATUS response? 
I have "properties":{"legacyCloud":"false"}
In the legacy cloud mode, also calles format 1, the state is stored in a 
central clusterstate.js node in ZK, which does not scale well. In the 
modern mode every collection has its own state.json node in ZK. I guess 
there is something mixed up on your system. I would make sure to not use 
the legacy mode.


On 11.02.2019 05:57, ramyogi wrote:

I found the reason,
=true when I create a collection with this parameter I could
find that replicas data in CLUSTERSTATUS api request,. is there anything
wrong if I use this in SOLR 7.5.0 when create a collection ?
Please advice.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html




Re: Solr moved all replicas from node

2019-02-10 Thread Hendrik Haddorp

I opened https://issues.apache.org/jira/browse/SOLR-13240 for the exception.

On 10.02.2019 01:35, Hendrik Haddorp wrote:

Hi,

I have two Solr clouds using Version 7.6.0 with 4 nodes each and about 
500 collections with one shard and a replication factor of 2 per Solr 
cloud. The data is stored in the HDFS. I restarted the nodes one by 
one and always waited for the replicas to fully recover before I 
restarted the next. Once the last node was restarted I noticed that 
Solr was starting to move replicas to other nodes. Actually it started 
to move all replicas from one node, which is now left empty. Is there 
any way to figure out why Solr decided to move all replicas to other 
nodes?
The only problem that I see is that during the recovery the Solr 
instance logged a problem with the HDFS, claiming that the filesystem 
is closed. The recovery seems to have continued after that just fine 
though and the logs are clean for the time after wards.
I restarted the node now and invoked the UTILIZENODE action that moved 
a few replicas back to the node but then failed with this exception:


{
  "responseHeader":{
    "status":500,
    "QTime":40220},
  "Operation utilizenode caused 
exception:":"java.lang.IllegalArgumentException:java.lang.IllegalArgumentException: 
Comparison method violates its general contract!",

  "exception":{
    "msg":"Comparison method violates its general contract!",
    "rspCode":-1},
  "error":{
    "metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
    "msg":"Comparison method violates its general contract!",
    "trace":"org.apache.solr.common.SolrException: Comparison method 
violates its general contract!\n\tat 
org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat 
org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:274)\n\tat 
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:246)\n\tat 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat 
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)\n\tat 
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)\n\tat 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)\n\tat 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)\n\tat 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)\n\tat 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)\n\tat 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)\n\tat 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat 
org.eclipse.jetty.server.Server.handle(Server.java:531)\n\tat 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)\n\tat 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)\n\tat 
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)\n\tat 
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\n\tat 
org

Re: CloudSolrClient getDocCollection

2019-02-10 Thread Hendrik Haddorp
I opened now https://issues.apache.org/jira/browse/SOLR-13239 for the 
problem I observed.


Well, who can really be sure about those things. But I would assume it 
should still be ok. The number of watchers should still not be gigantic. 
I have setups with about 2000 collections each but far less JVMs. ZK 
distributes the watches over the all nodes, which should also include 
observer nodes.


Said that an alternative could be to refresh the cache asynchronously to 
the call detecting it to be outdated. Wouldn't the worst case be that a 
request gets send to a Solr node that has to forward the request to the 
correct node? The chance for the cache entry to be wrong after just one 
minute is however quite low. So in most cases the request would still be 
send to the correct node without having to wait for the cache update and 
without potentially blocking other requests. In a performance test we 
saw quite a few threads being blocked at this point.


regards,
Hendrik

On 09.02.2019 20:40, Erick Erickson wrote:

Jason's comments are exactly why there _is_ a state.json per
collection rather than the single clusterstate.json in the original
implementation.

Hendrik:
yes, please do open a JIRA for the condition you observed,
especially if you can point to the suspect code. There have
been intermittent issues with collection creation in the test
shells.

About the watchers.

bq. Yes, you would need one watch per state.json and
thus one watch per collection. That should however not really be a
problem with ZK.

Consider an installation I have witnessed with 450K replicas scattered
over 100s of collections and 100s of JVMs. Each JVM may have one
or more CloudSolrClients. Are you _sure_ ZK can handle that kind
of watch load? The current architecture allows there to be many fewer
watches set, partially to deal with this scale. And even at this scale,
an incoming request to a node that does _not_ host _any_ replica of
the target collection needs to be able to forward the request, but doesn't
need to know much else about the target collections.

Best,
Erick


On Fri, Feb 8, 2019 at 5:23 PM Hendrik Haddorp  wrote:

Hi Jason,

thanks for your answer. Yes, you would need one watch per state.json and
thus one watch per collection. That should however not really be a
problem with ZK. I would assume that the Solr server instances need to
monitor those nodes to be up to date on the cluster state. Using
org.apache.solr.common.cloud.ZkStateReader.registerCollectionStateWatcher
you can even add a watch for that using the SolrJ API. At least for the
currently watched collections the client should thus actually already
have the correct information available. The access to that would likely
be a bit ugly though.

The CloudSolrClient also allows to set a watch on /collections using
org.apache.solr.common.cloud.ZkStateReader.registerCloudCollectionsListener.
This is actually another thing I just ran into. As the code has a watch
on /collections the listener gets informed about new collections as soon
as the "directory" for the collection is being created. If the listener
does then straight away try to access the collection info via
zkStateReader.getClusterState() the DocCollection can be returned as
null as the DocCollection is build on the information stored in the
state.json file, which might not exist yet. I'm trying to monitor the
Solr cluster state and thus ran into this. Not sure if I should open a
Jira for that.

regards,
Hendrik

On 08.02.2019 23:20, Jason Gerlowski wrote:

Hi Henrik,

I'll try to answer, and let others correct me if I stray.  I wasn't
around when CloudSolrClient was written, so take this with a grain of
salt:

"Why does the client need that timeout?Wouldn't it make sense to
use a watch?"

You could probably write a CloudSolrClient that uses watch(es) to keep
track of changing collection state.  But I suspect you'd need a
watch-per-collection, instead of just a single watch.

Modern versions of Solr store the state for each collection in
individual "state.json" ZK nodes
("/solr/collections//state.json").  To catch changes
to all of these collections, you'd need to watch each of those nodes.
Which wouldn't scale well for users who want lots of collections.  I
suspect this was one of the concerns that nudged the author(s) to use
a cache-based approach.

(Even when all collection state was stored in a single ZK node, a
watch-based CloudSolrClient would likely have scaling issues for the
many-collection use case.  The client would need to recalculate its
state information for _all_ collections any time that _any_ of the
collections changed, since it has no way to tell which collection was
changed.)

Best,

Jason

On Thu, Feb 7, 2019 at 11:44 AM Hendrik Haddorp  wrote:

Hi,

when I perform a query using the CloudSolrClient the code first
retrieves the DocCollection to determine to which instance the query
should be send [1]. getDocCollection [2] does a lookup in a c

Re: Solr moved all replicas from node

2019-02-10 Thread Hendrik Haddorp

Solr version is 7.6.0
autoAddReplicas is set to true
/api/cluster/autoscaling returns this:

{
  "responseHeader":{
"status":0,
"QTime":1},
  "cluster-preferences":[{
  "minimize":"cores",
  "precision":1}],
  "cluster-policy":[{
  "replica":"<2",
  "shard":"#EACH",
  "node":"#ANY"}],
  "triggers":{
".auto_add_replicas":{
  "name":".auto_add_replicas",
  "event":"nodeLost",
  "waitFor":1800,
  "enabled":true,
  "actions":[{
  "name":"auto_add_replicas_plan",
  "class":"solr.AutoAddReplicasPlanAction"},
{
  "name":"execute_plan",
  "class":"solr.ExecutePlanAction"}]},
".scheduled_maintenance":{
  "name":".scheduled_maintenance",
  "event":"scheduled",
  "startTime":"NOW",
  "every":"+1DAY",
  "enabled":true,
  "actions":[{
  "name":"inactive_shard_plan",
  "class":"solr.InactiveShardPlanAction"},
{
  "name":"execute_plan",
  "class":"solr.ExecutePlanAction"}]}},
  "listeners":{
".auto_add_replicas.system":{
  "beforeAction":[],
  "afterAction":[],
  "stage":["STARTED",
"ABORTED",
"SUCCEEDED",
"FAILED",
"BEFORE_ACTION",
"AFTER_ACTION",
"IGNORED"],
  "trigger":".auto_add_replicas",
  "class":"org.apache.solr.cloud.autoscaling.SystemLogListener"},
".scheduled_maintenance.system":{
  "beforeAction":[],
  "afterAction":[],
  "stage":["STARTED",
"ABORTED",
"SUCCEEDED",
"FAILED",
"BEFORE_ACTION",
"AFTER_ACTION",
    "IGNORED"],
  "trigger":".scheduled_maintenance",
  "class":"org.apache.solr.cloud.autoscaling.SystemLogListener"}},
  "properties":{},
  "WARNING":"This response format is experimental.  It is likely to change in the 
future."}

I have two solr clouds that are setup in the same way. When restarting 
the nodes only one of them showed this behavior.
Ideally I want replicas to be moved when a node is down for a longer 
time but not when I just restart it. I would also like all nodes to end 
up with the same number of cores.


On 10.02.2019 05:30, Erick Erickson wrote:

What version of Solr? Do you have any of the autoscaling stuff turned
on? What about autoAddReplicas (which does not need Solr 7x)?

On Sat, Feb 9, 2019 at 4:35 PM Hendrik Haddorp  wrote:

Hi,

I have two Solr clouds using Version 7.6.0 with 4 nodes each and about
500 collections with one shard and a replication factor of 2 per Solr
cloud. The data is stored in the HDFS. I restarted the nodes one by one
and always waited for the replicas to fully recover before I restarted
the next. Once the last node was restarted I noticed that Solr was
starting to move replicas to other nodes. Actually it started to move
all replicas from one node, which is now left empty. Is there any way to
figure out why Solr decided to move all replicas to other nodes?
The only problem that I see is that during the recovery the Solr
instance logged a problem with the HDFS, claiming that the filesystem is
closed. The recovery seems to have continued after that just fine though
and the logs are clean for the time after wards.
I restarted the node now and invoked the UTILIZENODE action that moved a
few replicas back to the node but then failed with this exception:

{
"responseHeader":{
  "status":500,
  "QTime":40220},
"Operation utilizenode caused
exception:":"java.lang.IllegalArgumentException:java.lang.IllegalArgumentException:
Comparison method violates its general contract!",
"exception":{
  "msg":"Comparison method violates its general contract!",
  "rspCode":-1},
"error":{
  "metadata":[
"error-class","org.apache.solr.common.SolrException",
"root-error-class","org.apache.solr.common.SolrException"],
  "msg":"Comparison method violates its general contract!",
  "trace":"org.apache.solr.commo

Solr moved all replicas from node

2019-02-09 Thread Hendrik Haddorp

Hi,

I have two Solr clouds using Version 7.6.0 with 4 nodes each and about 
500 collections with one shard and a replication factor of 2 per Solr 
cloud. The data is stored in the HDFS. I restarted the nodes one by one 
and always waited for the replicas to fully recover before I restarted 
the next. Once the last node was restarted I noticed that Solr was 
starting to move replicas to other nodes. Actually it started to move 
all replicas from one node, which is now left empty. Is there any way to 
figure out why Solr decided to move all replicas to other nodes?
The only problem that I see is that during the recovery the Solr 
instance logged a problem with the HDFS, claiming that the filesystem is 
closed. The recovery seems to have continued after that just fine though 
and the logs are clean for the time after wards.
I restarted the node now and invoked the UTILIZENODE action that moved a 
few replicas back to the node but then failed with this exception:


{
  "responseHeader":{
    "status":500,
    "QTime":40220},
  "Operation utilizenode caused 
exception:":"java.lang.IllegalArgumentException:java.lang.IllegalArgumentException: 
Comparison method violates its general contract!",

  "exception":{
    "msg":"Comparison method violates its general contract!",
    "rspCode":-1},
  "error":{
    "metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
    "msg":"Comparison method violates its general contract!",
    "trace":"org.apache.solr.common.SolrException: Comparison method 
violates its general contract!\n\tat 
org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat 
org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:274)\n\tat 
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:246)\n\tat 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat 
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)\n\tat 
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)\n\tat 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)\n\tat 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)\n\tat 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)\n\tat 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)\n\tat 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)\n\tat 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat 
org.eclipse.jetty.server.Server.handle(Server.java:531)\n\tat 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)\n\tat 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)\n\tat 
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)\n\tat 
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\n\tat 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\n\tat 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)\n\tat 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)\n\tat 

Re: CloudSolrClient getDocCollection

2019-02-08 Thread Hendrik Haddorp

Hi Jason,

thanks for your answer. Yes, you would need one watch per state.json and 
thus one watch per collection. That should however not really be a 
problem with ZK. I would assume that the Solr server instances need to 
monitor those nodes to be up to date on the cluster state. Using 
org.apache.solr.common.cloud.ZkStateReader.registerCollectionStateWatcher 
you can even add a watch for that using the SolrJ API. At least for the 
currently watched collections the client should thus actually already 
have the correct information available. The access to that would likely 
be a bit ugly though.


The CloudSolrClient also allows to set a watch on /collections using 
org.apache.solr.common.cloud.ZkStateReader.registerCloudCollectionsListener. 
This is actually another thing I just ran into. As the code has a watch 
on /collections the listener gets informed about new collections as soon 
as the "directory" for the collection is being created. If the listener 
does then straight away try to access the collection info via 
zkStateReader.getClusterState() the DocCollection can be returned as 
null as the DocCollection is build on the information stored in the 
state.json file, which might not exist yet. I'm trying to monitor the 
Solr cluster state and thus ran into this. Not sure if I should open a 
Jira for that.


regards,
Hendrik

On 08.02.2019 23:20, Jason Gerlowski wrote:

Hi Henrik,

I'll try to answer, and let others correct me if I stray.  I wasn't
around when CloudSolrClient was written, so take this with a grain of
salt:

"Why does the client need that timeout?Wouldn't it make sense to
use a watch?"

You could probably write a CloudSolrClient that uses watch(es) to keep
track of changing collection state.  But I suspect you'd need a
watch-per-collection, instead of just a single watch.

Modern versions of Solr store the state for each collection in
individual "state.json" ZK nodes
("/solr/collections//state.json").  To catch changes
to all of these collections, you'd need to watch each of those nodes.
Which wouldn't scale well for users who want lots of collections.  I
suspect this was one of the concerns that nudged the author(s) to use
a cache-based approach.

(Even when all collection state was stored in a single ZK node, a
watch-based CloudSolrClient would likely have scaling issues for the
many-collection use case.  The client would need to recalculate its
state information for _all_ collections any time that _any_ of the
collections changed, since it has no way to tell which collection was
changed.)

Best,

Jason

On Thu, Feb 7, 2019 at 11:44 AM Hendrik Haddorp  wrote:

Hi,

when I perform a query using the CloudSolrClient the code first
retrieves the DocCollection to determine to which instance the query
should be send [1]. getDocCollection [2] does a lookup in a cache, which
has a 60s expiration time [3]. When a DocCollection has to be reloaded
this is guarded by a lock [4]. Per default there are 3 locks, which can
cause some congestion. The main question though is why does the client
need that timeout? According to this [5] comment the code does not use a
watch. Wouldn't it make sense to use a watch? I thought the big
advantage of the CloudSolrClient is that is knows were to send requests
to, so that no extra hop needs to be done on the server side. Having to
query ZooKeeper though for the current state does however take some of
that advantage.

regards,
Hendrik

[1]
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L849
[2]
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L1180
[3]
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L162
[4]
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L1200
[5]
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L821




CloudSolrClient getDocCollection

2019-02-07 Thread Hendrik Haddorp

Hi,

when I perform a query using the CloudSolrClient the code first 
retrieves the DocCollection to determine to which instance the query 
should be send [1]. getDocCollection [2] does a lookup in a cache, which 
has a 60s expiration time [3]. When a DocCollection has to be reloaded 
this is guarded by a lock [4]. Per default there are 3 locks, which can 
cause some congestion. The main question though is why does the client 
need that timeout? According to this [5] comment the code does not use a 
watch. Wouldn't it make sense to use a watch? I thought the big 
advantage of the CloudSolrClient is that is knows were to send requests 
to, so that no extra hop needs to be done on the server side. Having to 
query ZooKeeper though for the current state does however take some of 
that advantage.


regards,
Hendrik

[1] 
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L849
[2] 
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L1180
[3] 
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L162
[4] 
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L1200
[5] 
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L821


Re: Large Number of Collections takes down Solr 7.3

2019-01-29 Thread Hendrik Haddorp
How much memory do the Solr instances have? Any more details on what 
happens when the Solr instances start to fail?

We are using multiple Solr clouds to keep the collection count low(er).

On 29.01.2019 06:53, Gus Heck wrote:

Does it all have to be in a single cloud?

On Mon, Jan 28, 2019, 10:34 PM Shawn Heisey 
On 1/28/2019 8:12 PM, Monica Skidmore wrote:

I would have to negotiate with the middle-ware teams - but, we've used a

core per customer in master-slave mode for about 3 years now, with great
success.  Our pool of data is very large, so limiting a customer's searches
to just their core keeps query times fast (or at least reduces the chances
of one customer impacting another with expensive queries.  There is also a
little security added - since the customer is required to provide the core
to search, there is less chance that they'll see another customer's data in
their responses (like they might if they 'forgot' to add a filter to their
query.  We were hoping that moving to Cloud would help our management of
the largest customers - some of which we'd like to sub-shard with the cloud
tooling.  We expected cloud to support as many cores/collections as our
2-versions-old Solr instances - but we didn't count on all the increased
network traffic or the extra complications of bringing up a large cloud
cluster.

At this time, SolrCloud will not handle what you're trying to throw at
it.  Without Cloud, Solr can fairly easily handle thousands of indexes,
because there is no communication between nodes about cluster state.
The immensity of that communication (handled via ZooKeeper) is why
SolrCloud can't scale to thousands of shard replicas.

The solution to this problem will be twofold:  1) Reduce the number of
work items in the Overseer queue.  2) Make the Overseer do its job a lot
faster.  There have been small incremental improvements towards these
goals, but as you've noticed, we're definitely not there yet.

On the subject of a customer forgetting to add a filter ... your systems
should be handling that for them ... if the customer has direct access
to Solr, then all bets are off... they'll be able to do just about
anything they want.  It is possible to configure a proxy to limit what
somebody can get to, but it would be pretty complicated to come up with
a proxy configuration that fully locks things down.

Using shards is completely possible without SolrCloud.  But SolrCloud
certainly does make it a lot easier.

How many records in your largest customer indexes?  How big are those
indexes on disk?

Thanks,
Shawn





Re: SolrCloud recovery

2019-01-25 Thread Hendrik Haddorp
On a system with about 1600 collections, each having one shard and a 
replication factor of two it took around an hour to recover completely 
after an instance restart. The setup used HDFS for the storage. And we 
are using Solr 7.4 at the moment. The overseer queue management helped 
us a lot! Before that Solr could easily swirl into death if the queue 
grew too fast. I haven't checked the logs on what the recovery does. Is 
there anything specific to look for?


During the recovery one can see how Solr is going over the replicas one 
by one and never really working on more then about 5 replicas at a time, 
often less. The progress also seems to be done in alphabetical order. I 
believe that used to be different in older versions. I will try to give 
the coreLoadThreads setting a test.


Hendrik

On 25.01.2019 16:51, Erick Erickson wrote:

That's just _loading_, recovery happens later so I'd
be surprised if this really made a difference, but you
never know.

I'm more interested in _why_ recovery takes so long.
and why recovery happens in the first place. It's normal
for replicas when starting up to to from down->recovering->active,
that's just part of the normal cycle. But the recovering state
should be relatively short absent having to replicate the
index from the leader.

If active indexing is going on, then the replicas may have to
copy their index down from the leader. Does this happen
on a system that is not indexing?

What version of Solr? All the state changes go through
the Overseer, and there were some very significant improvements
in Solr 6.6+, see:
https://issues.apache.org/jira/browse/SOLR-10265

And can you put a number to "rather long"? There's a built-in
3 minute wait for leader election if there's no leader for
a slice. That's not relevant if the replica in recovery
belongs to a shard that already has a leader, but if you
restart your entire cluster it can come into play.

Best,
Erick

On Fri, Jan 25, 2019 at 3:32 AM Hendrik Haddorp  wrote:

Thanks, that sounds good. Didn't know that parameter.

On 25.01.2019 11:23, Vadim Ivanov wrote:

   You can try to tweak solr.xml


coreLoadThreads
Specifies the number of threads that will be assigned to load cores in parallel.

https://lucene.apache.org/solr/guide/7_6/format-of-solr-xml.html


-Original Message-----
From: Hendrik Haddorp [mailto:hendrik.hadd...@gmx.net]
Sent: Friday, January 25, 2019 11:39 AM
To: solr-user@lucene.apache.org
Subject: SolrCloud recovery

Hi,

I have a SolrCloud with many collections. When I restart an instance and
the replicas are recovering I noticed that number replicas recovering at
one point is usually around 5. This results in the recovery to take
rather long. Is there a configuration option that controls how many
replicas can recover in parallel?

thanks,
Hendrik




Re: SolrCloud recovery

2019-01-25 Thread Hendrik Haddorp

Thanks, that sounds good. Didn't know that parameter.

On 25.01.2019 11:23, Vadim Ivanov wrote:

  You can try to tweak solr.xml


coreLoadThreads
Specifies the number of threads that will be assigned to load cores in parallel.

https://lucene.apache.org/solr/guide/7_6/format-of-solr-xml.html


-Original Message-
From: Hendrik Haddorp [mailto:hendrik.hadd...@gmx.net]
Sent: Friday, January 25, 2019 11:39 AM
To: solr-user@lucene.apache.org
Subject: SolrCloud recovery

Hi,

I have a SolrCloud with many collections. When I restart an instance and
the replicas are recovering I noticed that number replicas recovering at
one point is usually around 5. This results in the recovery to take
rather long. Is there a configuration option that controls how many
replicas can recover in parallel?

thanks,
Hendrik




SolrCloud recovery

2019-01-25 Thread Hendrik Haddorp

Hi,

I have a SolrCloud with many collections. When I restart an instance and 
the replicas are recovering I noticed that number replicas recovering at 
one point is usually around 5. This results in the recovery to take 
rather long. Is there a configuration option that controls how many 
replicas can recover in parallel?


thanks,
Hendrik


Re: Solr index writing to s3

2019-01-16 Thread Hendrik Haddorp
Theoretically you should be able to use the HDFS backend, which you can 
configure to use s3. Last time I tried that it did however not work for 
some reason. Here is an example for that, which also seems to have 
ultimately failed: 
https://community.plm.automation.siemens.com/t5/Developer-Space/Running-Solr-on-S3/td-p/449360


On 16.01.2019 19:39, Naveen M wrote:

hi,

My requirement is to write the index data into S3, we have solr installed
on aws instances. Please let me know if there is any documentation on how
to achieve writing the index data to s3.

Thanks





Re: Improve indexing speed?

2019-01-01 Thread Hendrik Haddorp
How are you indexing the documents? Are you using SolrJ or the plain 
REST API?
Are you sending the documents one by one or all in one request? The 
performance is far better if you send the 100 documents in one request.

If you send them individual, are you doing any commits between them?

regards,
Hendrik

On 01.01.2019 16:59, John Milton wrote:

Hi to all,

My document contains 65 fields. All the fields needs to be indexed. But for
the 100 documents takes 10 seconds for indexing.
I am using Solr 7.5 (2 cloud instance), with 50 shards.
It's running on Windows OS and it has 32 GB RAM. Java heap space 15 GB.
How to improve indexing speed?
Note :
All the fields contains maximum 20 characters only. Field type is text
general with case insensitive.

Thanks,
John Milton





Re: solr is using TLS1.0

2018-11-21 Thread Hendrik Haddorp

Hi Anchal,

the IBM JVM behaves differently in the TLS setup then the Oracle JVM. If 
you search for IBM Java TLS 1.2 you find tons of reports of problems 
with that. In most cases you can get around that using the system 
property "com.ibm.jsse2.overrideDefaultTLS" as documented here: 
https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.security.component.80.doc/security-component/jsse2Docs/matchsslcontext_tls.html


regards,
Hendrik

On 22.11.2018 07:25, Anchal Sharma2 wrote:


Hi Shawn ,

Thanks for your reply .

Here are the details abut java we are using :
java version "1.8.0_151"
IBM J9 VM (build 2.9, JRE 1.8.0 AIX ppc64-64 Compressed References 
20171102_369060 (JIT enabled, AOT enabled)

I have already patched the policy jars .

And I tried to comment out the ciphers ,protocol entries in 
jetty-ssl.xml ,but it did not work for me .I also tried to use an 
"IncludeCipherSuites" entry to include a cipher I wanted to include 
,but it did not work either .I started getting 
SSL_ERROR_INTERNAL_ERROR_ALERT and ssl_error_no_cypher_overlap errors 
on my console URL.I tried this in solr 7.3.1 version ,so jetty version 
must also be relatively new.


Do you think java might not be letting me enable TLS1.2?

Thanks & Regards,
-
Anchal Sharma


Inactive hide details for Shawn Heisey ---21-11-2018 05:28:50---On 
11/20/2018 3:02 AM, Anchal Sharma2 wrote: > I have enabled Shawn 
Heisey ---21-11-2018 05:28:50---On 11/20/2018 3:02 AM, Anchal Sharma2 
wrote: > I have enabled SSL for solr using steps mentioned o


From: Shawn Heisey 
To: solr-user@lucene.apache.org
Date: 21-11-2018 05:28
Subject: Re: solr is using TLS1.0





On 11/20/2018 3:02 AM, Anchal Sharma2 wrote:
> I have enabled  SSL for solr  using steps mentioned over Lucene
> website .And though solr console URL is now secure(https) ,it is still
> using TLS v1.0.
> I have  tried   few things to force SSL to use  TLS1.2 protocol ,but 
they

> have not worked for me .
>
> While trying to do same ,I have observed solr itself does not offer any
> solr property to specify cipher ,algorithm or TLS version .
>
> Following things have been tried :
> 1.key store /trust store for solr  to enable SSL  with different key
> algorithm ,etc combinations for the certificates
> 2.different  solr versions for step 1(solr 5.x,6.x,7.x-we are using solr
> 5.3 currently)
> 3.using java version 1.8 and adding solr certificate in java keystore to
> enforce TLS1.2

Solr lets Java and Jetty handle TLS.  Solr itself doesn't get involved
except to provide information to other software.

There are a whole lot of versions of Java 8, and at least three vendors
for it.  The big names are Oracle, IBM, and OpenJDK.  What vendor and
exact version of Java are you running? What OS is it on?  Do you have
the "unlimited JCE" addition installed in your Java and enabled?  If
your Java version is new enough, you won't need to mess with JCE.  See
this page:

https://golb.hplar.ch/2017/10/JCE-policy-changes-in-Java-SE-8u151-and-8u152.html

Solr 5.3 ships with Jetty 9.2.11, which is considered very outdated by
the Jetty project -- released well over three years ago.  From the
perspective of the Solr project, version 5.3 is also very old -- two
major versions behind what's current, and also released three years ago.

Jetty 9.2 is up to 9.2.26.  The current version is Jetty 9.4.14.  The
latest version of Solr (7.5.0) is shipping with Jetty 9.4.11. I think
Jetty will likely be upgraded to the latest release for Solr 7.6.0.

Have you made any changes to the Jetty config, particularly
jetty-ssl.xml?  One thing you might try, although I'll warn you that it
may make no difference at all, is to remove the parts of that config
file that exclude certain protocols and ciphers, letting Jetty decide
for itself what it should use.  Recent versions of Jetty and Java have
very good defaults.  I do not know whether Jetty 9.2.11 (included with
Solr 5.3, as mentioned) has good defaults or not.

Thanks,
Shawn









Re: Solr JVM Memory settings

2018-10-15 Thread Hendrik Haddorp
I wasn't stating that Docker is the solution at all. I was also stating 
that the native memory would go down if you limit the heap.
I'm running Solr in Docker with a memory limit and thus have to make 
sure that the memory is limited as otherwise Linux kills the JVM. For 
that I'm limiting the heap and meta space, which is also ofter refered 
as native memory. For the Oracle JVM (version 8) the heap and meta space 
can be limited. There are a few more memory areas, which I believe you 
can not limit correctly. Just putting Solr in Docker will of course not 
do any of that for you.


On 12.10.2018 19:59, Christopher Schultz wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hendrik,

On 10/12/18 02:36, Hendrik Haddorp wrote:

Those constraints can be easily set if you are using Docker. The
problem is however that at least up to Oracle Java 8, and I believe
quite a bit further, the JVM is not at all aware about those
limits. That's why when running Solr in Docker you really need to
make sure that you set the memory limits lower. I usually set the
heap and metaspace size. How you set them depends again a bit on
your Solr configuration. I prefer the JVM to crash due to memory
limits rather then the Linux OOM Killer killing the JVM as the
OutOfMemoryError from the JVM does at least state what memory was
out.

Limiting the native memory used by attempting to limit the heap is not
actually limiting the native memory used. It's just an attempt to do
so. If you limit the native memory using OS limits (or, using Docker,
simply make it look like there is less system memory) then you haven't
actually achieved anything. You could have done that simply by
lowering heap values and avoided the complexity of Docker, etc.

- -chris


On 11.10.2018 16:45, Christopher Schultz wrote: Shawn,

On 10/11/18 12:54 AM, Shawn Heisey wrote:

On 10/10/2018 10:08 PM, Sourav Moitra wrote:

We have a Solr server with 8gb of memory. We are using solr
in cloud mode, solr version is 7.5, Java version is Oracle
Java 9 and settings for Xmx and Xms value is 2g but we are
observing that the RAM getting used to 98% when doing
indexing.

How can I ensure that SolrCloud doesn't use more than N GB
of memory ?

Where precisely are you seeing the 98% usage?  It is
completely normal for a modern operating system to report
that almost all the system memory is in use, at least after
the system has been shuffling a lot of data.  All modern
operating systems will use memory that has not been
specifically allocated to programs for disk caching purposes,
and system information tools will generally indicate that
this memory is in use, even though it can be instantly
claimed by any program that requests it.

https://en.wikipedia.org/wiki/Page_cache

If you tell a Java program that it is limited to a 2GB heap,
then that program will never use more than 2GB, plus a little
extra for the java runtime itself.  I cannot give you an
exact figure for that little bit extra.  But every bit of
data on disk that Solr accesses will end up (at least
temporarily) in the operating system's disk cache -- using
that unallocated memory.

https://wiki.apache.org/solr/SolrPerformanceProblems#RAM

To be fair, the JVM can use *much more* memory than you have
specified for your Java heap. It's just that the Java heap itself
wont exceed those values.

The JVM uses quite a bit of native memory which isn't counted in
the Java heap. There is only one way I know of to control that, and
it's to set a process-limit at the OS level on the amount of
memory allowed. I'm not sure how sensitive to those limits the JVM
actually is, so attempting to artificially constrain the JVM might
end up with a native OOM crash.

-chris


-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlvA4O0ACgkQHPApP6U8
pFiCHg/+P+/yoSrvMd2uMyDK16nMCOIdxAL1gdS++DqS+qPmch1BHJTA9nuHybF4
j6WElpCI7Q3HP/sgsGE8kHE6Kg+DFJNz7mGJqgXjnSkm90LzETRFMqa959fTgBo6
SILD4n4LnZI844VoaKb2gIVibr804hloxX5UDe0XYFp3EtcVi4QMC5Q2ovn8+RoJ
S/LJx/VQi3AqtcCaEYAAKpYrKxO3OkoIKnN+oC55ag/16zh9StT2TUI03bBslcxn
PkS5zdsSmsS7NydSR4Gn4C7wAGyL3hGoU6pD+GhvYE9EF29KxHXFSIe2FJQ6mdRf
ikZvm17U8OFNwqlB4OOLziGvOkcmIgtqchnhUm80Qwtn0ZMbql2zwlIhOSPWbuPL
lq3F09p1QBqPjbxJdrcmpoSFH8jvmIPdrPOl3BbPEmDzNdnF03sEGP5gDyJ9/INB
AD/QhqvQEKUtMBPX+1/9dxOm+JyUDlARZQ7p4k1BeFjl2BI8imLUK/c6JlWJ757G
QWk+0Ff3R02va+ITWNvGs5C1uOnu2g58eqAggREPWXmXAj9nqJ5EyPkNAaGJBheo
NasGNSXVnjN+hk4QlMTAJ3C5u0Q5lW3HCOXj8Mufo7LE8M96OjRkM09o87NG9sGT
EdX7V8Ypw758Jt9xcms6U9tC2TqekJ9AYu+VLsoGa4OZgy5hfDk=
=Sq+f
-END PGP SIGNATURE-




Re: Solr JVM Memory settings

2018-10-12 Thread Hendrik Haddorp
Those constraints can be easily set if you are using Docker. The problem 
is however that at least up to Oracle Java 8, and I believe quite a bit 
further, the JVM is not at all aware about those limits. That's why when 
running Solr in Docker you really need to make sure that you set the 
memory limits lower. I usually set the heap and metaspace size. How you 
set them depends again a bit on your Solr configuration. I prefer the 
JVM to crash due to memory limits rather then the Linux OOM Killer 
killing the JVM as the OutOfMemoryError from the JVM does at least state 
what memory was out.


Hendrik

On 11.10.2018 16:45, Christopher Schultz wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shawn,

On 10/11/18 12:54 AM, Shawn Heisey wrote:

On 10/10/2018 10:08 PM, Sourav Moitra wrote:

We have a Solr server with 8gb of memory. We are using solr in
cloud mode, solr version is 7.5, Java version is Oracle Java 9
and settings for Xmx and Xms value is 2g but we are observing
that the RAM getting used to 98% when doing indexing.

How can I ensure that SolrCloud doesn't use more than N GB of
memory ?

Where precisely are you seeing the 98% usage?  It is completely
normal for a modern operating system to report that almost all the
system memory is in use, at least after the system has been
shuffling a lot of data.  All modern operating systems will use
memory that has not been specifically allocated to programs for
disk caching purposes, and system information tools will generally
indicate that this memory is in use, even though it can be
instantly claimed by any program that requests it.

https://en.wikipedia.org/wiki/Page_cache

If you tell a Java program that it is limited to a 2GB heap, then
that program will never use more than 2GB, plus a little extra for
the java runtime itself.  I cannot give you an exact figure for
that little bit extra.  But every bit of data on disk that Solr
accesses will end up (at least temporarily) in the operating
system's disk cache -- using that unallocated memory.

https://wiki.apache.org/solr/SolrPerformanceProblems#RAM

To be fair, the JVM can use *much more* memory than you have specified
for your Java heap. It's just that the Java heap itself wont exceed
those values.

The JVM uses quite a bit of native memory which isn't counted in the
Java heap. There is only one way I know of to control that, and it's
to set a process-limit at the OS level on the amount of memory
allowed. I'm not sure how sensitive to those limits the JVM actually
is, so attempting to artificially constrain the JVM might end up with
a native OOM crash.

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlu/YgoACgkQHPApP6U8
pFjcbBAAgYegk20pYvfu3vcrAKxj3s+RSMGRPZ+nN5g0KYQFuhFgptYE+TqjLfBX
geekQUNqNUHO5psMA5q/6m6b3LwpqrMxJiapv0wWQ2wPah21CgLs/P/iG+elNQ63
H0ZXbe3wX0P0onZbP4+sfDyzhujZ+5+gMooK87o8Q4z91hIVX1EZfM4lcaZ3pbnb
JJ44YorWGPpXjQNEtOHfS7l/Q+8+6+XfEyfKha3JpRFcwcqgLpv23Koy4xgxgYr+
PMqfjptMBMjZ04xSdd491crm2yZowv3KH1Ss8v/L51rknGYPxCEkdKvPrUlpn+Rb
4WnQS6H//dJvQaLum/qR9Jxd+3vc13K7Mn++5Lu+jMbeEgaJU2hD4/ap/KMtFCqn
eIXl6HQYPW36sVcm/MIpkRvAgx8vri17sd3/5sOYaETrp4SMxMN5W44GvgDdkbGF
R9/tVBCFWb3p+o8eSKUf7QmARiN69DHGVwtQHWMIp8K9893IeHUNgVXKD7281zLB
AjHPc7QTvAn4xne0X9lvQjr+YKOPxd9FFqMBejdKht9aBFQvApma9LtJT3FInrob
QkSIx594KhoRltRy7E9t3XuWWGg8ujiuzKl6SEPsgXUC2Opwr4Wwu1yn9dCWkFJz
RzCKbaDBaNmrK6HSEsoNvS+yQPksPxM8MuchFaCAMZpVOsobCM0=
=77dD
-END PGP SIGNATURE-




Re: Solr JVM Memory settings

2018-10-11 Thread Hendrik Haddorp
Beside the heap the JVM has other memory areas, like the metaspace: 
https://docs.oracle.com/javase/9/tools/java.htm

-> MaxMetaspaceSize
search for "size" in that document and you'll find tons of further 
settings. I have not tried out Oracle Java 9 yet.


regards,
Hendrik

On 11.10.2018 06:08, Sourav Moitra wrote:

Hello,

We have a Solr server with 8gb of memory. We are using solr in cloud
mode, solr version is 7.5, Java version is Oracle Java 9 and settings
for Xmx and Xms value is 2g but we are observing that the RAM getting
used to 98% when doing indexing.

How can I ensure that SolrCloud doesn't use more than N GB of memory ?

Sourav Moitra
https://souravmoitra.com




deprecated field types

2018-08-06 Thread Hendrik Haddorp

Hi,

the Solr documentation lists deprecated field types at:
https://lucene.apache.org/solr/guide/7_4/field-types-included-with-solr.html

Below the table the following is stated:
/All Trie* numeric and date field types have been deprecated in favor of 
*Point field types. Point field types are better at range queries 
(speed, memory, disk), however simple field:value queries underperform 
relative to Trie. Either accept this, or continue to use Trie fields. 
This shortcoming may be addressed in a future release./


Given that it is suggested that one can keep using these fields can I 
expect that the types are not being removed in Solr 8?


thanks,
Hendrik


NullPointerException in SolrMetricManager

2018-07-31 Thread Hendrik Haddorp

Hi,

we are seeing the following NPE sometimes when we delete a collection 
right after we modify the schema:


08:47:46.407 [zkCallback-5-thread-4] INFO 
org.apache.solr.rest.ManagedResource 209 processStoredData - Loaded 
initArgs {ignoreCase=true} for /schema/analysis/stopwords/text_ar
08:47:46.407 [zkCallback-5-thread-4] INFO 
org.apache.solr.rest.schema.analysis.ManagedWordSetResource 116 
onManagedDataLoadedFromStorage - Loaded 119 words for 
/schema/analysis/stopwords/text_ar
08:47:46.407 [zkCallback-5-thread-4] INFO 
org.apache.solr.rest.ManagedResource 117 notifyObserversDuringInit - 
Notified 8 observers of /schema/analysis/stopwords/text_ar
08:47:46.407 [zkCallback-5-thread-4] INFO 
org.apache.solr.rest.RestManager 668 addRegisteredResource - Registered 
new managed resource /schema/analysis/stopwords/text_ar
08:47:46.408 [zkCallback-5-thread-4] INFO 
org.apache.solr.schema.IndexSchema 592 readSchema - Loaded schema 
solr-config/1.6 with uniqueid field id
08:47:46.408 [zkCallback-5-thread-4] INFO 
org.apache.solr.schema.ZkIndexSchemaReader 177 updateSchema - Finished 
refreshing schema in 411 ms
08:47:46.415 [qtp254749889-20] INFO  org.apache.solr.core.SolrCore 1517 
close - [donald.test-query-1533026857986_shard1_replica_n1] CLOSING 
SolrCore org.apache.solr.core.SolrCore@62ef7f0c
08:47:46.415 [qtp254749889-20] INFO 
org.apache.solr.metrics.SolrMetricManager 1038 closeReporters - Closing 
metric reporters for 
registry=solr.core.donald.test-query-1533026857986.shard1.replica_n1, 
tag=62ef7f0c
08:47:46.416 [qtp254749889-20] INFO 
org.apache.solr.metrics.SolrMetricManager 1038 closeReporters - Closing 
metric reporters for 
registry=solr.collection.donald.test-query-1533026857986.shard1.leader, 
tag=62ef7f0c
08:47:46.416 [Thread-20] INFO 
org.apache.solr.metrics.reporters.SolrJmxReporter 112 doInit - JMX 
monitoring for 
'solr.core.donald.test-query-1533026857986.shard1.replica_n1' (registry 
'solr.core.donald.test-query-1533026857986.shard1.replica_n1') enabled 
at server: com.sun.jmx.mbeanserver.JmxMBeanServer@2698dc7
08:47:46.417 [Thread-20] WARN  org.apache.solr.cloud.ZkController 2689 
lambda$fireEventListeners$6 - listener throws error 
org.apache.solr.common.SolrException: Unable to reload core 
[donald.test-query-1533026857986_shard1_replica_n1]
 at 
org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1411) 
~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - 
jpountz - 2018-06-18 16:55:13]
 at 
org.apache.solr.core.SolrCore.lambda$getConfListener$20(SolrCore.java:3029) 
~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - 
jpountz - 2018-06-18 16:55:13]
 at 
org.apache.solr.cloud.ZkController.lambda$fireEventListeners$6(ZkController.java:2687) 
~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - 
jpountz - 2018-06-18 16:55:13]

 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: java.lang.NullPointerException
 at 
org.apache.solr.metrics.SolrMetricManager.loadShardReporters(SolrMetricManager.java:1146) 
~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - 
jpountz - 2018-06-18 16:55:13]
 at 
org.apache.solr.metrics.SolrCoreMetricManager.loadReporters(SolrCoreMetricManager.java:92) 
~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - 
jpountz - 2018-06-18 16:55:13]
 at org.apache.solr.core.SolrCore.(SolrCore.java:909) 
~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - 
jpountz - 2018-06-18 16:55:13]
 at org.apache.solr.core.SolrCore.reload(SolrCore.java:663) 
~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - 
jpountz - 2018-06-18 16:55:13]
 at 
org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1390) 
~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - 
jpountz - 2018-06-18 16:55:13]

 ... 3 more

regards,
Hendrik


Re: SolrJ and autoscaling

2018-06-08 Thread Hendrik Haddorp

I opened a Jira for it: https://issues.apache.org/jira/browse/SOLR-12467

On 08.06.2018 07:24, Shalin Shekhar Mangar wrote:

Yes, we don't have Solrj support for changing autoscaling configuration
today. It'd be nice to have for sure. Can you please file a Jira? Patches
are welcome too!

On Wed, Jun 6, 2018 at 8:33 PM, Hendrik Haddorp 
wrote:


Hi,

I'm trying to read and modify the autoscaling config. The API on
https://lucene.apache.org/solr/guide/7_3/solrcloud-autoscaling-api.html
does only mention the REST API. The read part does however also work via
SolrJ:

 cloudSolrClient.getZkStateReader().getAutoScalingConfig()

Just for the write part I could not find anything in the API. Is this
still a gap?

regards,
Hendrik








Re: Running Solr on HDFS - Disk space

2018-06-07 Thread Hendrik Haddorp
The only option should be to configure Solr to just have a replication 
factor of 1 or HDFS to have no replication. I would go for the middle 
and configure both to use a factor of 2. This way a single failure in 
HDFS and Solr is not a problem. While in 1/3 or 3/1 option a single 
server error would bring the collection down.


Setting the HDFS replication factor is a bit tricky as Solr takes in 
some places the default replication factor set on HDFS and some times 
takes a default from the client side. HDFS allows you to set a 
replication factor for every file individually.


regards,
Hendrik

On 07.06.2018 15:30, Shawn Heisey wrote:

On 6/7/2018 6:41 AM, Greenhorn Techie wrote:

As HDFS has got its own replication mechanism, with a HDFS replication
factor of 3, and then SolrCloud replication factor of 3, does that mean
each document will probably have around 9 copies replicated 
underneath of
HDFS? If so, is there a way to configure HDFS or Solr such that only 
three

copies are maintained overall?


Yes, that is exactly what happens.

SolrCloud replication assumes that each of its replicas is a 
completely independent index.  I am not aware of anything in Solr's 
HDFS support that can use one HDFS index directory for multiple 
replicas.  At the most basic level, a Solr index is a Lucene index.  
Lucene goes to great lengths to make sure that an index *CANNOT* be 
used in more than one place.


Perhaps somebody who is more familiar with HDFSDirectoryFactory can 
offer you a solution.  But as far as I know, there isn't one.


Thanks,
Shawn





SolrJ and autoscaling

2018-06-06 Thread Hendrik Haddorp

Hi,

I'm trying to read and modify the autoscaling config. The API on 
https://lucene.apache.org/solr/guide/7_3/solrcloud-autoscaling-api.html 
does only mention the REST API. The read part does however also work via 
SolrJ:


    cloudSolrClient.getZkStateReader().getAutoScalingConfig()

Just for the write part I could not find anything in the API. Is this 
still a gap?


regards,
Hendrik


managed resources and SolrJ

2018-05-08 Thread Hendrik Haddorp

Hi,

we are looking into using manged resources for synonyms via the 
ManagedSynonymGraphFilterFactory. It seems like there is no SolrJ API 
for that. I would be especially interested in one via the 
CloudSolrClient. I found 
http://lifelongprogrammer.blogspot.de/2017/01/build-rest-apis-to-update-solrs-managed-resources.html. 
Is there a better solution?


regards,
Hendrik


Re: collection properties

2018-04-14 Thread Hendrik Haddorp

I opened SOLR-12224 for this:
https://issues.apache.org/jira/browse/SOLR-12224

On 14.04.2018 01:49, Shawn Heisey wrote:

On 4/13/2018 5:07 PM, Tomás Fernández Löbbe wrote:

Yes... Unfortunately there is no GET API :S Can you open a Jira? Patch
should be trivial

My suggestion would be to return the list of properties for a collection
when a URL like this is used:

/solr/admin/collections?action=COLLECTIONPROP=gettingstarted

At the moment, this complains that he propertyName parameter is required
and missing.

If the "name" parameter is omitted, it should return the properties for
ALL collections.  The format of the single-collection response should be
the same as the all-collection response -- in the JSON, have a key for
the collection name and then under that, keys for each property.  It
would be nice to allow multiple "name" parameters for the list (when
propertyName is not present).

Do we also need a specific parameter to explicitly tell Solr to list the
properties?  Or maybe an explicit action value for listing them, like
LISTCOLLECTIONPROP?

Thanks,
Shawn





collection properties

2018-04-13 Thread Hendrik Haddorp

Hi,

with Solr 7.3 it is possible to set arbitrary collection properties 
using 
https://lucene.apache.org/solr/guide/7_3/collections-api.html#collectionprop
But how do I read out the properties again? So far I could not find a 
REST call that would return the properties. I do see my property in the 
ZK file collectionprops.json below my collection though.


thanks,
Hendrik


Re: in-place updates

2018-04-12 Thread Hendrik Haddorp

ah, right, sorry

On 11.04.2018 17:38, Emir Arnautović wrote:

Hi Hendrik,
Documentation clearly states conditions when in-place updates are possible: 
https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html#UpdatingPartsofDocuments-In-PlaceUpdates
 
<https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html#UpdatingPartsofDocuments-In-PlaceUpdates>
The first one mentions “numeric docValues”.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/




On 11 Apr 2018, at 07:34, Hendrik Haddorp <hendrik.hadd...@gmx.net> wrote:

Hi,

in 
http://lucene.472066.n3.nabble.com/In-Place-Updates-not-working-as-expected-tp4375621p4380035.html
 some restrictions on the supported fields are given. I could however not find 
if in-place updates are supported for are field types or if they only work for 
say numeric fields.

thanks,
Hendrik






in-place updates

2018-04-10 Thread Hendrik Haddorp

Hi,

in 
http://lucene.472066.n3.nabble.com/In-Place-Updates-not-working-as-expected-tp4375621p4380035.html 
some restrictions on the supported fields are given. I could however not 
find if in-place updates are supported for are field types or if they 
only work for say numeric fields.


thanks,
Hendrik


Re: Problem accessing /solr/_shard1_replica_n1/get

2018-03-24 Thread Hendrik Haddorp
Ah, ok, that might then be related to the auto add replica feature. 
Since trying Solr 7 I noticed that Solr is moving my cores around on its 
own. I did not see that happening in Solr 6. I believe Solr 6 could also 
move replicas on HDFS around but I actually never see that happening.


According to CloudConfig.java the default auto replica failover time is 
30s and I used to wait 2min when restarting nodes as otherwise I ran 
into problems with the overseer queue, which got fixed in later Solr 6 
releases. I'm actually just experimenting with increasing the failover 
time to 5min so that my nodes can restart before the replicas get moved. 
Maybe that does then also resolve this type of problem. Issue SOLR-12114 
does make changing the config a bit more tricky though but I got it updated.


thanks,
Hendrik

On 24.03.2018 18:31, Shawn Heisey wrote:

On 3/24/2018 11:22 AM, Hendrik Haddorp wrote:
below is the full entry from the Solr log. I actually also found the 
list of implicit request handlers later on. But that does make it 
even more strange that Solr complains about a missing handler.


The "not found" is rather generic, and might not be referring to the 
handler.  I wonder if we can improve those not found messages to 
indicate *what* wasn't found.


2018-03-22 18:19:25.599 ERROR 
(updateExecutor-3-thread-7-processing-n:search-agent3:9007_solr 
x:collection-0005_shard1_replica_n2 s:shard1 c:collection-0005 
r:core_node4) [c:collection-0005 s:shard1 r:core_node4 
x:collection-0005_shard1_replica_n2] o.a.s.c.SyncStrategy 
http://search-agent3:9007/solr/collection-0005_shard1_replica_n2/: 
Could not tell a replica to 
recover:org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
Error from server at http://search-agent3:9007/solr: Unable to locate 
core collection-0005_shard1_replica_n1


Based on the end of what I quoted here, I think that the issue here 
might be that the *core* doesn't exist, not that the handler doesn't 
exist.  Which may mean that the info in zookeeper doesn't match the 
cores that are actually present and working.


If the core does exist on the disk, maybe Solr had a problem getting 
the core started.


Thanks,
Shawn





Re: Problem accessing /solr/_shard1_replica_n1/get

2018-03-24 Thread Hendrik Haddorp

Hi Shawn,

below is the full entry from the Solr log. I actually also found the 
list of implicit request handlers later on. But that does make it even 
more strange that Solr complains about a missing handler.


2018-03-22 18:19:25.583 ERROR 
(zkCallback-7-thread-2-processing-n:search-agent3:9007_solr) 
[c:collection-0005 s:shard1 r:core_node4 
x:collection-0005_shard1_replica_n2] o.a.s.c.SyncStrategy Sync request 
error: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
Error from server at 
http://search-agent3:9007/solr/collection-0005_shard1_replica_n1: 
Expected mime type application/octet-stream but got text/html. 



Error 404 Not Found

HTTP ERROR 404
Problem accessing /solr/collection-0005_shard1_replica_n1/get. Reason:
    Not Found



2018-03-22 18:19:25.599 ERROR 
(updateExecutor-3-thread-7-processing-n:search-agent3:9007_solr 
x:collection-0005_shard1_replica_n2 s:shard1 c:collection-0005 
r:core_node4) [c:collection-0005 s:shard1 r:core_node4 
x:collection-0005_shard1_replica_n2] o.a.s.c.SyncStrategy 
http://search-agent3:9007/solr/collection-0005_shard1_replica_n2/: Could 
not tell a replica to 
recover:org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
Error from server at http://search-agent3:9007/solr: Unable to locate 
core collection-0005_shard1_replica_n1
    at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643)
    at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
    at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
    at 
org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)

    at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:300)
    at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
    at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

    at java.lang.Thread.run(Thread.java:748)

On 24.03.2018 03:52, Shawn Heisey wrote:

On 3/23/2018 4:08 AM, Hendrik Haddorp wrote:
I did not define a /get request handler but I also don't see one 
being default in the solrconfig.xml files that come with Solr 7.2.1. 
Do I need to add that as described in 
https://www.garysieling.com/blog/fixing-solrj-error-expected-mime-type-applicationoctet-stream-got-texthtml 
or is there something wrong in Solr? Is Solr doing those /get calls? 
My program is not doing any, unless SolrJ does them under the covers, 
and is actually not showing any error. 


Your error message looks like it was only the *end* of the message and 
was missing all the bits at the beginning that would tell us what the 
error was.  The whole error message is needed.


A number of the well-known request handlers are created *implicitly* 
when the config for them is not found in solrconfig.xml.


https://lucene.apache.org/solr/guide/7_2/implicit-requesthandlers.html

Thanks,
Shawn





Problem accessing /solr/_shard1_replica_n1/get

2018-03-23 Thread Hendrik Haddorp

Hi,

I have a Solr Cloud 7.2.1 setup and used SolrJ (7.2.1) to create 1000 
collections with a few documents. During that I got multiple times in 
the Solr logs exceptions because an access of the /get handler of a 
collection failed. The call stack looks like this:
    at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:607)
    at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
    at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
    at 
org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
    at 
org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:172)

    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
    at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

    at java.lang.Thread.run(Thread.java:748)

I did not define a /get request handler but I also don't see one being 
default in the solrconfig.xml files that come with Solr 7.2.1. Do I need 
to add that as described in 
https://www.garysieling.com/blog/fixing-solrj-error-expected-mime-type-applicationoctet-stream-got-texthtml 
or is there something wrong in Solr? Is Solr doing those /get calls? My 
program is not doing any, unless SolrJ does them under the covers, and 
is actually not showing any error.


regards,
Hendrik


Re: collection reload leads to OutOfMemoryError

2018-03-18 Thread Hendrik Haddorp
I increased the metaspace size to 2GB. This way I could do multiple 
rounds of reloading all collections already. The GC logs do show now an 
almost stable metaspace size. So maybe I did just set the limits too 
low. Still a bit odd that reloading the collections results in a higher 
memory usage. Shouldn't all collections be loaded during the startup?


On 18.03.2018 17:22, Hendrik Haddorp wrote:

Hi,

I did a simple test on a three node cluster using Solr 7.2.1. The JVMs 
(Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 1.8.0_162 
25.162-b12) have about 6.5GB heap and 1.5GB metaspace. In my test I 
have 1000 collections with only 1000 simple documents each. I'm then 
triggering collections reloads via SolrJ using a fixed number of 
threads, as this has shown memory issues in the past. Even with two 
threads the nodes eventually die with an OOM Error as they are running 
out of metaspace. I found the following Jiras that might be about the 
same issue:

    https://issues.apache.org/jira/browse/SOLR-10506
    https://issues.apache.org/jira/browse/SOLR-9117
    https://issues.apache.org/jira/browse/SOLR-6678

The first two are flagged as fixed in 7.0.

Any ideas, beside not doing reloads?

regards,
Hendrik




collection reload leads to OutOfMemoryError

2018-03-18 Thread Hendrik Haddorp

Hi,

I did a simple test on a three node cluster using Solr 7.2.1. The JVMs 
(Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 1.8.0_162 
25.162-b12) have about 6.5GB heap and 1.5GB metaspace. In my test I have 
1000 collections with only 1000 simple documents each. I'm then 
triggering collections reloads via SolrJ using a fixed number of 
threads, as this has shown memory issues in the past. Even with two 
threads the nodes eventually die with an OOM Error as they are running 
out of metaspace. I found the following Jiras that might be about the 
same issue:

    https://issues.apache.org/jira/browse/SOLR-10506
    https://issues.apache.org/jira/browse/SOLR-9117
    https://issues.apache.org/jira/browse/SOLR-6678

The first two are flagged as fixed in 7.0.

Any ideas, beside not doing reloads?

regards,
Hendrik


Re: Solr on DC/OS ?

2018-03-15 Thread Hendrik Haddorp

Hi,

we are running Solr on Marathon/Mesos, which should basically be the 
same as DC/OS. Solr and ZooKeeper are running in docker containers. I 
wrote my own Mesos framework that handles the assignment to the agents. 
There is a public sample that does the same for ElasticSearch. I'm not 
aware of a public Solr Mesos framework. The only "mediation" that 
happens here is that Solr runs in a docker container with a memory 
limit. If you give it enough resources it should be pretty close to 
running straight on the machine. JVM memory tuning and docker is however 
not the most fun.


regards,
Hendrik

On 15.03.2018 00:09, Rick Leir wrote:

Søren,
DC/OS installs on top of Ubuntu or RedHat, and it is used to coordinate many 
machines so they appear as a cluster.

Solr needs to be on a single machine, or in the case of SolrCloud, on many 
machines. It has no need of the coordination which DC/OS provides. Solr depends 
on direct access to lots of memory, and if any coordination layer attempts to 
mediate access to the memory then Solr would slow down. I recommend you install 
Solr directly on Ubuntu or Redhat or Windows Server (Disclosure: I know very 
little about DC/OS)
Cheers -- Rick


On March 14, 2018 6:19:22 AM EDT, "Søren"  wrote:

Hi, has anyone experience in running solr on DC/OS?

If so, how is that achieved succesfully? Solr is not in Universe.

Thanks in advance,
Soren




Re: SolrCloud update and luceneMatchVersion

2018-03-14 Thread Hendrik Haddorp

Thanks for the detailed description!

On 14.03.2018 16:11, Shawn Heisey wrote:

On 3/14/2018 5:56 AM, Hendrik Haddorp wrote:
So you are saying that we do not need to run the IndexUpgrader tool 
if we move from 6 to 7. Will the index be then updated automatically 
or will we get a problem once we move to 8?


If you don't run IndexUpgrader, and the index version is one that the 
new Solr can read, then existing index segments will remain in the 
format they are.  New segments will be written in the new format.  If 
any of the existing segments are merged, then the new larger segment 
will be in the new format.


Summary: If an index starts out as 6.x, then is run for a while in 
7.x, but there are still 6.x segments left, then that index will not 
work in 8.0.


IndexUpgrader is a Lucene tool.  This tool just runs a forceMerge 
process on the index, which will merge all of the existing segments 
into a single segment.  It's EXACTLY the same operation that Solr 
calls "optimize".  (Lucene used to call it optimize too.  Then they 
renamed it.)


How would one use the IndexUpgrader at all with Solr? Would one need 
to run it against the index of every core?


The Solr server must be shut down during the IndexUpgrader run. 
IndexUpgrader is a completely separate tool, part of Lucene.  It has 
zero knowledge of anything that you have configured in Solr, so you 
must locate the index directory of any core you want to upgrade and 
run the tool on that index directory.


Thanks,
Shawn





Re: SolrCloud update and luceneMatchVersion

2018-03-14 Thread Hendrik Haddorp
So you are saying that we do not need to run the IndexUpgrader tool if 
we move from 6 to 7. Will the index be then updated automatically or 
will we get a problem once we move to 8?


How would one use the IndexUpgrader at all with Solr? Would one need to 
run it against the index of every core?


On 14.03.2018 11:14, Shawn Heisey wrote:

On 3/14/2018 3:04 AM, Hendrik Haddorp wrote:

we have a SolrCloud 6.3 with HDFS setup and plan to upgrade to 7.2.1.

The cluster upgrade instructions on 
https://lucene.apache.org/solr/guide/7_2/upgrading-a-solr-cluster.html 
does not contain any information on changing the luceneMatchVersion. 
If we change the luceneMatchVersion manually is it enough to just 
reload the collection or do we need to perform an index upgrade like 
the IndexUpgrader tool 
(https://lucene.apache.org/solr/guide/7_2/indexupgrader-tool.html) 
does? If so how would one use that for an index stored in HDFS? 


Most people seem to expect that defining luceneMatchVersion will allow 
them to build an index with the format of an earlier version.


This is not what happens.  Solr builds indexes in the format that the 
same version of Lucene chooses by default.  As far s I know, you can't 
change the index format in Solr.


Somebody who is writing a Lucene program (instead of using Solr) can 
choose to use an earlier version's format, but this is not done with 
luceneMatchVersion.


The luceneMatchVersion setting is used by the index analysis 
components (tokenizers, filters, etc).  Sometimes when a component's 
behavior is SIGNIFICANTLY changed by a version upgrade, the developer 
will put in a luceneMatchVersion check so users can revert to the old 
behavior if they want to.  Only a minority of changes to analysis 
components are controlled by luceneMatchVersion.


Generally there is no need to use the IndexUpgrader tool unless you're 
updating at least two major version numbers.  This is because a 7.x 
version can only read version 6.x indexes. Anything earlier must be 
upgraded to 6.x.


But it's my strong opinion that if you're upgrading two major 
versions, then you should build a new index from scratch, and use a 
new configuration that has been designed from the ground up for the 
new version.  Users who upgrade that far often find that they cannot 
use their configurations in the new version without changes, so they 
MUST rebuild.


I actually recommend always building the index from scratch for ANY 
Solr upgrade.


Thanks,
Shawn





SolrCloud update and luceneMatchVersion

2018-03-14 Thread Hendrik Haddorp

Hi,

we have a SolrCloud 6.3 with HDFS setup and plan to upgrade to 7.2.1.

The cluster upgrade instructions on 
https://lucene.apache.org/solr/guide/7_2/upgrading-a-solr-cluster.html 
does not contain any information on changing the luceneMatchVersion. If 
we change the luceneMatchVersion manually is it enough to just reload 
the collection or do we need to perform an index upgrade like the 
IndexUpgrader tool 
(https://lucene.apache.org/solr/guide/7_2/indexupgrader-tool.html) does? 
If so how would one use that for an index stored in HDFS?


regards,
Hendrik


Re: CLUSTERSTATUS API and Error loading specified collection / config in Solr 5.3.2.

2018-03-12 Thread Hendrik Haddorp

Hi,

are your collections using stateFormat 1 or 2? In version 1 all state 
was stored in one file while in version 2 each collection has its own 
state.json. I assume that in the old version it could happen that the 
common file still contains state for a collection that was deleted. So I 
would read out the clusterstate.json in ZooKeeper and manually verify 
its content.


regards,
Hendrik

On 11.03.2018 19:27, Atita Arora wrote:

Hi ,

I am working on an application which involves working on a highly
distributed Solr cloud environment. The application supports multi-tenancy
and we have around 250-300 collections on Solr where each client has their
own collection with a new shard being created as clientid-
where the timestamp is whenever the new data comes in for the client
(typically every 4-8 hrs) , the reason for this convention is to make sure
when the Indexes are being built (on demand) the timestamp matches closely
to the time when the last indexing was run (the earlier shard is
de-provisioned as soon as the new one is created). Whenever the indexing is
triggered it first makes a DB entry and then creates a catalog with
timestamp in solr.
The Solr cloud has 10 Nodes distributed geographically among 10 datacenters.
The replication factor is 2. The Solr version is 5.3.2.
Coming to my problem - I had to write a utility to ensure that the DB
insert timestamp matches closely to the Solr index timestamp wherein I can
ensure that if the difference between DB timestamp and Solr Index tinestamp
is <= 2 hrs , we have fresh index. The new index contains revised prices of
products or offers etc which are critical to be updated as in when they
come. Hence this utility is to track that the required updates have been
successfully made.
I used *CLUSTERSTATUS* api for this task. It is serving the purpose well so
far , but pretty recently our solr cloud started complaining of strange
things because of which the *CLUSTERSTATUS* api keeps returning as error.

The error claims to be of missing config & sometime missing collections
like.

org.apache.solr.common.SolrException: Could not find collection :

1785-1520548816454

org.apache.solr.common.SolrException: Could not find collection :
1785-1520548816454
at
org.apache.solr.common.cloud.ClusterState.getCollection(ClusterState.java:165)
at
org.apache.solr.handler.admin.ClusterStatus.getClusterStatus(ClusterStatus.java:110)
at
org.apache.solr.handler.admin.CollectionsHandler$CollectionOperation$19.call(CollectionsHandler.java:614)
at
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:166)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:678)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:444)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:215)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)

The other times it would complain of missing the config for same or
different client id- timestamp like :

1532-1518669619526_shard1_replica3:
org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
Specified config does not exist in ZooKeeper:1532-1518669619526I

I would really appreciate if :


1. Someone can possibly guide me as to whats going on Solr Cloud
2. If CLUSTERSTATUS is the right pick to build such utility. Do we
have any other option?


Thanks for any pointers and suggestions.

Appreciate your attention looking this through.

Atita





MODIFYCOLLECTION via Solrj

2018-02-07 Thread Hendrik Haddorp

Hi,

I'm unable to find how I can do a MODIFYCOLLECTION via Solrj. I would 
like to change the replication factor of a collection but can't find it 
in the Solrj API. Is that not supported?


regards,
Hendrik


HDFS replication factor

2018-01-27 Thread Hendrik Haddorp

Hi,

when I configure my HDFS setup to use a specific replication factor, 
like 1, this only effects the index files that Solr writes. The 
write.lock files and backups are being created with a different 
replication factor. The reason for this should be that HdfsFileWriter is 
loading the defaults from the server 
(fileSystem.getServerDefaults(path)) while HdfsLockFactory and 
HdfsBackupRepository are simply using defaults, which seems to end up 
using a replication factor of 3 (and a block size of 128MB). Is this 
known? If not shall I open a JIRA for this?


regards,
Hendrik


Re: SolrJ with Async Http Client

2018-01-03 Thread Hendrik Haddorp
There is asynchronous and non-blocking. If I use 100 threads to perform 
calls to Solr using the standard Java HTTP client or SolrJ I block 100 
threads even if I don't block my program logic threads by using async 
calls. However if I perform those HTTP calls using a non-blocking HTTP 
client, like netty, I basically only need a single eventing thread in 
addition to my normal threads. The advantage is less memory usage and an 
often better scaling. I would however expect that the main advantage 
would be on the server side.


On 02.01.2018 22:02, Gus Heck wrote:

It's not very clear (to me) what your use case is, but generally speaking,
asynchronous requests can be achieved by using threads/executors/futures
(java) or ajax (javascript). The link seems to be a scala project, I'm sure
scala has analogous facilities.

On Tue, Jan 2, 2018 at 10:31 AM, RAUNAK AGRAWAL 
wrote:


Hi Guys,

I am trying to write fully async service where solr calls are also async.
Just wondering did anyone tried calling solr in non-blocking mode or is
there is a way to do it? I have come across one such project
 but wondering is there anything provided
by solrj?

Thanks








Re: request dependent analyzer

2017-12-18 Thread Hendrik Haddorp

Hi, how do multiple analyzers help?

On 18.12.2017 10:25, Markus Jelsma wrote:

Hi - That is impossible. But you can construct many analyzers instead.
  
-Original message-

From:Hendrik Haddorp 
Sent: Monday 18th December 2017 8:35
To: solr-user 
Subject: request dependent analyzer

Hi,

currently we use a lot of small collections that all basically have the
same schema. This does not scale too well. So we are looking into
combining multiple collections into one. We would however like some
analyzers to behave slightly differently depending on the logical
collection. We would for example like to use different synonyms in the
different logical collections. Is there any clean way on how to do that,
like somehow access request parameters from an analyzer?

regards,
Hendrik





request dependent analyzer

2017-12-17 Thread Hendrik Haddorp

Hi,

currently we use a lot of small collections that all basically have the 
same schema. This does not scale too well. So we are looking into 
combining multiple collections into one. We would however like some 
analyzers to behave slightly differently depending on the logical 
collection. We would for example like to use different synonyms in the 
different logical collections. Is there any clean way on how to do that, 
like somehow access request parameters from an analyzer?


regards,
Hendrik


Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-12-09 Thread Hendrik Haddorp
Ok, thanks for the answer. The leader election and update notification 
sound like they should work using ZooKeeper (leader election recipe and 
a normal watch) but I guess there are some details that make things more 
complicated.


On 09.12.2017 20:19, Erick Erickson wrote:

This has been bandied about on a number of occasions, it boils down to
nobody has stepped up to make it happen. It turns out there are a
number of tricky issues:


how does leadership change if the leader goes down?
the raw complexity of getting it right. Getting it wrong corrupts indexes
how do you resolve leadership in the first place so only the leader writes to 
the index?
how would that affect performance if N replicas were autowarming at the same 
time, thus reading from HDFS?
how do the read-only replicas know to open a new searcher?
I'm sure there are a bunch more.

So this is one of those things that everyone agrees is interesting,
but nobody is willing to code and it's not actually clear that it
makes sense in the Solr context. It'd be a pity to put in all the work
then discover that the performance issues prohibited using it.

If you _guarantee_ that the index doesn't change, there's the
NoLockFactory you could specify. That would allow you to share a
common index, woe be unto you if you start updating the index though.

Best,
Erick

On Sat, Dec 9, 2017 at 4:46 AM, Hendrik Haddorp <hendrik.hadd...@gmx.net> wrote:

Hi,

for the HDFS case wouldn't it be nice if there was a mode in which the
replicas just read the same index files as the leader? I mean after all the
data is already on a shared readable file system so why would one even need
to replicate the transaction log files?

regards,
Hendrik


On 08.12.2017 21:07, Erick Erickson wrote:

bq: Will TLOG replicas use less network bandwidth?

No, probably more bandwidth. TLOG replicas work like this:
1> the raw docs are forwarded
2> the old-style master/slave replication is used

So what you do save is CPU processing on the TLOG replica in exchange
for increased bandwidth.

Since the only thing forwarded in NRT replicas (outside of recovery)
is the raw documents, I expect that TLOG replicas would _increase_
network usage. The deal is that TLOG replicas can take over leadership
if the leader goes down so they must have an
up-to-date-after-last-index-sync set of tlogs.

At least that's my current understanding...

Best,
Erick

On Fri, Dec 8, 2017 at 12:01 PM, Joe Obernberger
<joseph.obernber...@gmail.com> wrote:

Anyone have any thoughts on this?  Will TLOG replicas use less network
bandwidth?

-Joe


On 12/4/2017 12:54 PM, Joe Obernberger wrote:

Hi All - this same problem happened again, and I think I partially
understand what is going on.  The part I don't know is what caused any
of
the replicas to go into full recovery in the first place, but once they
do,
they cause network interfaces on servers to go fully utilized in both
in/out
directions.  It appears that when a solr replica needs to recover, it
calls
on the leader for all the data.  In HDFS, the data from the leader's
point
of view goes:

HDFS --> Solr Leader Process -->Network--> Replica Solr Process -->HDFS

Do I have this correct?  That poor network in the middle becomes a
bottleneck and causes other replicas to go into recovery, which causes
more
network traffic.  Perhaps going to TLOG replicas with 7.1 would be
better
with HDFS?  Would it be possible for the leader to send a message to the
replica to instead get the data straight from HDFS instead of going from
one
solr process to another?  HDFS would better be able to use the cluster
since
each block has 3x replicas.  Perhaps there is a better way to handle
replicas with a shared file system.

Our current plan to fix the issue is to go to Solr 7.1.0 and use TLOG.
Good idea?  Thank you!

-Joe


On 11/22/2017 8:17 PM, Erick Erickson wrote:

Hmm. This is quite possible. Any time things take "too long" it can be
a problem. For instance, if the leader sends docs to a replica and
the request times out, the leader throws the follower into "Leader
Initiated Recovery". The smoking gun here is that there are no errors
on the follower, just the notification that the leader put it into
recovery.

There are other variations on the theme, it all boils down to when
communications fall apart replicas go into recovery.

Best,
Erick

On Wed, Nov 22, 2017 at 11:02 AM, Joe Obernberger
<joseph.obernber...@gmail.com> wrote:

Hi Shawn - thank you for your reply. The index is 29.9TBytes as
reported
by:
hadoop fs -du -s -h /solr6.6.0
29.9 T  89.9 T  /solr6.6.0

The 89.9TBytes is due to HDFS having 3x replication.  There are about
1.1
billion documents indexed and we index about 2.5 million documents per
day.
Assuming an even distribution, each node is handling about 680GBytes
of
index.  So our cache size is 1.4%. Perhaps 'relatively small block
cache'
was an understatement! This is why we split the largest c

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-12-09 Thread Hendrik Haddorp

Hi,

for the HDFS case wouldn't it be nice if there was a mode in which the 
replicas just read the same index files as the leader? I mean after all 
the data is already on a shared readable file system so why would one 
even need to replicate the transaction log files?


regards,
Hendrik

On 08.12.2017 21:07, Erick Erickson wrote:

bq: Will TLOG replicas use less network bandwidth?

No, probably more bandwidth. TLOG replicas work like this:
1> the raw docs are forwarded
2> the old-style master/slave replication is used

So what you do save is CPU processing on the TLOG replica in exchange
for increased bandwidth.

Since the only thing forwarded in NRT replicas (outside of recovery)
is the raw documents, I expect that TLOG replicas would _increase_
network usage. The deal is that TLOG replicas can take over leadership
if the leader goes down so they must have an
up-to-date-after-last-index-sync set of tlogs.

At least that's my current understanding...

Best,
Erick

On Fri, Dec 8, 2017 at 12:01 PM, Joe Obernberger
 wrote:

Anyone have any thoughts on this?  Will TLOG replicas use less network
bandwidth?

-Joe


On 12/4/2017 12:54 PM, Joe Obernberger wrote:

Hi All - this same problem happened again, and I think I partially
understand what is going on.  The part I don't know is what caused any of
the replicas to go into full recovery in the first place, but once they do,
they cause network interfaces on servers to go fully utilized in both in/out
directions.  It appears that when a solr replica needs to recover, it calls
on the leader for all the data.  In HDFS, the data from the leader's point
of view goes:

HDFS --> Solr Leader Process -->Network--> Replica Solr Process -->HDFS

Do I have this correct?  That poor network in the middle becomes a
bottleneck and causes other replicas to go into recovery, which causes more
network traffic.  Perhaps going to TLOG replicas with 7.1 would be better
with HDFS?  Would it be possible for the leader to send a message to the
replica to instead get the data straight from HDFS instead of going from one
solr process to another?  HDFS would better be able to use the cluster since
each block has 3x replicas.  Perhaps there is a better way to handle
replicas with a shared file system.

Our current plan to fix the issue is to go to Solr 7.1.0 and use TLOG.
Good idea?  Thank you!

-Joe


On 11/22/2017 8:17 PM, Erick Erickson wrote:

Hmm. This is quite possible. Any time things take "too long" it can be
   a problem. For instance, if the leader sends docs to a replica and
the request times out, the leader throws the follower into "Leader
Initiated Recovery". The smoking gun here is that there are no errors
on the follower, just the notification that the leader put it into
recovery.

There are other variations on the theme, it all boils down to when
communications fall apart replicas go into recovery.

Best,
Erick

On Wed, Nov 22, 2017 at 11:02 AM, Joe Obernberger
 wrote:

Hi Shawn - thank you for your reply. The index is 29.9TBytes as reported
by:
hadoop fs -du -s -h /solr6.6.0
29.9 T  89.9 T  /solr6.6.0

The 89.9TBytes is due to HDFS having 3x replication.  There are about
1.1
billion documents indexed and we index about 2.5 million documents per
day.
Assuming an even distribution, each node is handling about 680GBytes of
index.  So our cache size is 1.4%. Perhaps 'relatively small block
cache'
was an understatement! This is why we split the largest collection into
two,
where one is data going back 30 days, and the other is all the data.
Most
of our searches are not longer than 30 days back.  The 30 day index is
2.6TBytes total.  I don't know how the HDFS block cache splits between
collections, but the 30 day index performs acceptable for our specific
application.

If we wanted to cache 50% of the index, each of our 45 nodes would need
a
block cache of about 350GBytes.  I'm accepting offers of DIMMs!

What I believe caused our 'recovery, fail, retry loop' was one of our
servers died.  This caused HDFS to start to replicate blocks across the
cluster and produced a lot of network activity.  When this happened, I
believe there was high network contention for specific nodes in the
cluster
and their network interfaces became pegged and requests for HDFS blocks
timed out.  When that happened, SolrCloud went into recovery which
caused
more network traffic.  Fun stuff.

-Joe


On 11/22/2017 11:44 AM, Shawn Heisey wrote:

On 11/22/2017 6:44 AM, Joe Obernberger wrote:

Right now, we have a relatively small block cache due to the
requirements that the servers run other software.  We tried to find
the best balance between block cache size, and RAM for programs, while
still giving enough for local FS cache.  This came out to be 84 128M
blocks - or about 10G for the cache per node (45 nodes total).

How much data is being handled on a server with 10GB allocated for
caching HDFS data?

The first message in this thread says 

Re: Solr on HDFS vs local storage - Benchmarking

2017-11-22 Thread Hendrik Haddorp
We actually use no auto warming. Our collections are pretty small and 
the query performance is not really a problem so far. We are using lots 
of collections and most Solr caches seem to be per core and not global 
so we also have a problem with caching. I have to test the HDFS cache 
some more as that should work cross collections.


We also had an HDFS setup already so it looked like a good option to not 
loos data. Earlier we had a few cases where we lost the machines so HDFS 
looked safer for that.


I would expect that the HDFS performance is also quite good if you have 
lots of document adds and not so frequent commits. Frequent adds with 
commits, which is likely not good in general anyway, does look quite a 
bit slower then local storage so far. As we didn't see that in our 
earlier tests, which were more, query focused, I said it large depends 
on what you are doing.


Hendrik

On 22.11.2017 18:41, Erick Erickson wrote:

In my experience, for relatively static indexes the performance is
roughly similar. Once the data is read from whatever data source it's
in memory, where the data came from is (largely) secondary in
importance.

In cases where there's a lot of I/O I expect HDFS to be slower, this
fits Hendrik's observation: "We now had a patter with lots of small
updates and commits and that seems to be quite a bit slower". He's
merging segments and (presumably) autowarming frequently, implying
lots of I/O and HDFS adds an extra layer.

Personally I'd use whichever is most convenient and see if the
performance was "good enough". I wouldn't recommend _installing_ HDFS
just to use it with Solr, why add another complication? If you need
the redundancy add replicas. If you already have the HDFS
infrastructure in place and using HDFS is easier than local storage,
feel free

Best,
Erick


On Wed, Nov 22, 2017 at 8:06 AM, Greenhorn Techie
<greenhorntec...@gmail.com> wrote:

Hendrik,

Thanks for your response.

Regarding "But this seems to greatly depend on how your setup looks like
and what actions you perform." May I know what are the factors influence
and what considerations are to be taken in relation to this?

Thanks

On Wed, 22 Nov 2017 at 14:16 Hendrik Haddorp <hendrik.hadd...@gmx.net>
wrote:


We did some testing and the performance was strangely even better with
HDFS then the with the local file system. But this seems to greatly
depend on how your setup looks like and what actions you perform. We now
had a patter with lots of small updates and commits and that seems to be
quite a bit slower. We are about to do performance testing on that now.

The reason we switched to HDFS was largely connected to us using Docker
and Marathon/Mesos. With HDFS the data is in a shared file system and
thus it is possible to move the replica to a different instance on a a
different host.

regards,
Hendrik

On 22.11.2017 14:59, Greenhorn Techie wrote:

Hi,

Good Afternoon!!

While the discussion around issues related to "Solr on HDFS" is live, I
would like to understand if anyone has done any performance benchmarking
for both Solr indexing and search between HDFS vs local file system.

Also, from experience, what would the community folks suggest? Solr on
local file system or Solr on HDFS? Has anyone done a comparative study of
these choices?

Thanks







Re: Solr on HDFS vs local storage - Benchmarking

2017-11-22 Thread Hendrik Haddorp
We did some testing and the performance was strangely even better with 
HDFS then the with the local file system. But this seems to greatly 
depend on how your setup looks like and what actions you perform. We now 
had a patter with lots of small updates and commits and that seems to be 
quite a bit slower. We are about to do performance testing on that now.


The reason we switched to HDFS was largely connected to us using Docker 
and Marathon/Mesos. With HDFS the data is in a shared file system and 
thus it is possible to move the replica to a different instance on a a 
different host.


regards,
Hendrik

On 22.11.2017 14:59, Greenhorn Techie wrote:

Hi,

Good Afternoon!!

While the discussion around issues related to "Solr on HDFS" is live, I
would like to understand if anyone has done any performance benchmarking
for both Solr indexing and search between HDFS vs local file system.

Also, from experience, what would the community folks suggest? Solr on
local file system or Solr on HDFS? Has anyone done a comparative study of
these choices?

Thanks





Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-22 Thread Hendrik Haddorp

Hi Joe,

sorry, I have not seen that problem. I would normally not delete a 
replica if the shard is down but only if there is an active shard. 
Without an active leader the replica should not be able to recover. I 
also just had a case where all replicas of a shard stayed in down state 
and restarts didn't help. This was however also caused by lock files. 
Once I cleaned them up and restarted all Solr instances that had a 
replica they recovered.


For the lock files I discovered that the index is not always in the 
"index" folder but can also be in an index. folder. There can 
be an "index.properties" file in the "data" directory in HDFS and this 
contains the correct index folder name.


If you are really desperate you could also delete all but one replica so 
that the leader election is quite trivial. But this does of course 
increase the risk of finally loosing the data quite a bit. So I would 
try looking into the code and figure out what the problem is here and 
maybe compare the state in HDFS and ZK with a shard that works.


regards,
Hendrik

On 21.11.2017 23:57, Joe Obernberger wrote:
Hi Hendrick - the shards in question have three replicas.  I tried 
restarting each one (one by one) - no luck.  No leader is found. I 
deleted one of the replicas and added a new one, and the new one also 
shows as 'down'.  I also tried the FORCELEADER call, but that had no 
effect.  I checked the OVERSEERSTATUS, but there is nothing unusual 
there.  I don't see anything useful in the logs except the error:


org.apache.solr.common.SolrException: Error getting leader from zk for 
shard shard21
    at 
org.apache.solr.cloud.ZkController.getLeader(ZkController.java:996)

    at org.apache.solr.cloud.ZkController.register(ZkController.java:902)
    at org.apache.solr.cloud.ZkController.register(ZkController.java:846)
    at 
org.apache.solr.core.ZkContainer.lambda$registerInZk$0(ZkContainer.java:181)
    at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.common.SolrException: Could not get leader 
props
    at 
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1043)
    at 
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1007)
    at 
org.apache.solr.cloud.ZkController.getLeader(ZkController.java:963)

    ... 7 more
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for /collections/UNCLASS/leaders/shard21/leader
    at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
    at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)

    at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
    at 
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:357)
    at 
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:354)
    at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
    at 
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:354)
    at 
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1021)

    ... 9 more

Can I modify zookeeper to force a leader?  Is there any other way to 
recover from this?  Thanks very much!


-Joe


On 11/21/2017 3:24 PM, Hendrik Haddorp wrote:
We sometimes also have replicas not recovering. If one replica is 
left active the easiest is to then to delete the replica and create a 
new one. When all replicas are down it helps most of the time to 
restart one of the nodes that contains a replica in down state. If 
that also doesn't get the replica to recover I would check the logs 
of the node and also that of the overseer node. I have seen the same 
issue on Solr using local storage. The main HDFS related issues we 
had so far was those lock files and if you delete and recreate 
collections/cores and it sometimes happens that the data was not 
cleaned up in HDFS and then causes a conflict.


Hendrik

On 21.11.2017 21:07, Joe Obernberger wrote:
We've never run an index this size in anything but HDFS, so I have 
no comparison.  What we've been doing is keeping two main 
collections - all data, and the last 30 days of data.  Then we 
handle queries based on date range. The 30 day index is 
significantly faster.


My main concern right now is that 6 of the 100 shards are not coming 
back because of no leader.  I've never seen this error before.  Any 
ideas?  ClusterStatus shows all three replicas with state 'down'.


Thanks!

-joe


On 11/21/2017 2:35 PM, Hendrik Haddorp wrote:
We actually also have some performance issue with HDFS at the 
moment. We are doing lots of soft commits for NRT search. Those 
seem to be slower then with lo

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Hendrik Haddorp
We sometimes also have replicas not recovering. If one replica is left 
active the easiest is to then to delete the replica and create a new 
one. When all replicas are down it helps most of the time to restart one 
of the nodes that contains a replica in down state. If that also doesn't 
get the replica to recover I would check the logs of the node and also 
that of the overseer node. I have seen the same issue on Solr using 
local storage. The main HDFS related issues we had so far was those lock 
files and if you delete and recreate collections/cores and it sometimes 
happens that the data was not cleaned up in HDFS and then causes a conflict.


Hendrik

On 21.11.2017 21:07, Joe Obernberger wrote:
We've never run an index this size in anything but HDFS, so I have no 
comparison.  What we've been doing is keeping two main collections - 
all data, and the last 30 days of data.  Then we handle queries based 
on date range.  The 30 day index is significantly faster.


My main concern right now is that 6 of the 100 shards are not coming 
back because of no leader.  I've never seen this error before.  Any 
ideas?  ClusterStatus shows all three replicas with state 'down'.


Thanks!

-joe


On 11/21/2017 2:35 PM, Hendrik Haddorp wrote:
We actually also have some performance issue with HDFS at the moment. 
We are doing lots of soft commits for NRT search. Those seem to be 
slower then with local storage. The investigation is however not 
really far yet.


We have a setup with 2000 collections, with one shard each and a 
replication factor of 2 or 3. When we restart nodes too fast that 
causes problems with the overseer queue, which can lead to the queue 
getting out of control and Solr pretty much dying. We are still on 
Solr 6.3. 6.6 has some improvements and should handle these actions 
faster. I would check what you see for 
"/solr/admin/collections?action=OVERSEERSTATUS=json". The critical 
part is the "overseer_queue_size" value. If this goes up to about 
1 it is pretty much game over on our setup. In that case it seems 
to be best to stop all nodes, clear the queue in ZK and then restart 
the nodes one by one with a gap of like 5min. That normally recovers 
pretty well.


regards,
Hendrik

On 21.11.2017 20:12, Joe Obernberger wrote:
We set the hard commit time long because we were having performance 
issues with HDFS, and thought that since the block size is 128M, 
having a longer hard commit made sense.  That was our hypothesis 
anyway.  Happy to switch it back and see what happens.


I don't know what caused the cluster to go into recovery in the 
first place.  We had a server die over the weekend, but it's just 
one out of ~50.  Every shard is 3x replicated (and 3x replicated in 
HDFS...so 9 copies).  It was at this point that we noticed lots of 
network activity, and most of the shards in this recovery, fail, 
retry loop.  That is when we decided to shut it down resulting in 
zombie lock files.


I tried using the FORCELEADER call, which completed, but doesn't 
seem to have any effect on the shards that have no leader. Kinda out 
of ideas for that problem.  If I can get the cluster back up, I'll 
try a lower hard commit time.  Thanks again Erick!


-Joe


On 11/21/2017 2:00 PM, Erick Erickson wrote:

Frankly with HDFS I'm a bit out of my depth so listen to Hendrik ;)...

I need to back up a bit. Once nodes are in this state it's not
surprising that they need to be forcefully killed. I was more thinking
about how they got in this situation in the first place. _Before_ you
get into the nasty state how are the Solr nodes shut down? Forcefully?

Your hard commit is far longer than it needs to be, resulting in much
larger tlog files etc. I usually set this at 15-60 seconds with local
disks, not quite sure whether longer intervals are helpful on HDFS.
What this means is that you can spend up to 30 minutes when you
restart solr _replaying the tlogs_! If Solr is killed, it may not have
had a chance to fsync the segments and may have to replay on startup.
If you have openSearcher set to false, the hard commit operation is
not horribly expensive, it just fsync's the current segments and opens
new ones. It won't be a total cure, but I bet reducing this interval
would help a lot.

Also, if you stop indexing there's no need to wait 30 minutes if you
issue a manual commit, something like
.../collection/update?commit=true. Just reducing the hard commit
interval will make the wait between stopping indexing and restarting
shorter all by itself if you don't want to issue the manual commit.

Best,
Erick

On Tue, Nov 21, 2017 at 10:34 AM, Hendrik Haddorp
<hendrik.hadd...@gmx.net> wrote:

Hi,

the write.lock issue I see as well when Solr is not been stopped 
gracefully.
The write.lock files are then left in the HDFS as they do not get 
removed

automatically when the client disconnects like a ephemeral node in
ZooKeeper. Unfortunately Solr does also not realize that it should 
be ownin

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Hendrik Haddorp
We actually also have some performance issue with HDFS at the moment. We 
are doing lots of soft commits for NRT search. Those seem to be slower 
then with local storage. The investigation is however not really far yet.


We have a setup with 2000 collections, with one shard each and a 
replication factor of 2 or 3. When we restart nodes too fast that causes 
problems with the overseer queue, which can lead to the queue getting 
out of control and Solr pretty much dying. We are still on Solr 6.3. 6.6 
has some improvements and should handle these actions faster. I would 
check what you see for 
"/solr/admin/collections?action=OVERSEERSTATUS=json". The critical 
part is the "overseer_queue_size" value. If this goes up to about 1 
it is pretty much game over on our setup. In that case it seems to be 
best to stop all nodes, clear the queue in ZK and then restart the nodes 
one by one with a gap of like 5min. That normally recovers pretty well.


regards,
Hendrik

On 21.11.2017 20:12, Joe Obernberger wrote:
We set the hard commit time long because we were having performance 
issues with HDFS, and thought that since the block size is 128M, 
having a longer hard commit made sense.  That was our hypothesis 
anyway.  Happy to switch it back and see what happens.


I don't know what caused the cluster to go into recovery in the first 
place.  We had a server die over the weekend, but it's just one out of 
~50.  Every shard is 3x replicated (and 3x replicated in HDFS...so 9 
copies).  It was at this point that we noticed lots of network 
activity, and most of the shards in this recovery, fail, retry loop.  
That is when we decided to shut it down resulting in zombie lock files.


I tried using the FORCELEADER call, which completed, but doesn't seem 
to have any effect on the shards that have no leader.  Kinda out of 
ideas for that problem.  If I can get the cluster back up, I'll try a 
lower hard commit time.  Thanks again Erick!


-Joe


On 11/21/2017 2:00 PM, Erick Erickson wrote:

Frankly with HDFS I'm a bit out of my depth so listen to Hendrik ;)...

I need to back up a bit. Once nodes are in this state it's not
surprising that they need to be forcefully killed. I was more thinking
about how they got in this situation in the first place. _Before_ you
get into the nasty state how are the Solr nodes shut down? Forcefully?

Your hard commit is far longer than it needs to be, resulting in much
larger tlog files etc. I usually set this at 15-60 seconds with local
disks, not quite sure whether longer intervals are helpful on HDFS.
What this means is that you can spend up to 30 minutes when you
restart solr _replaying the tlogs_! If Solr is killed, it may not have
had a chance to fsync the segments and may have to replay on startup.
If you have openSearcher set to false, the hard commit operation is
not horribly expensive, it just fsync's the current segments and opens
new ones. It won't be a total cure, but I bet reducing this interval
would help a lot.

Also, if you stop indexing there's no need to wait 30 minutes if you
issue a manual commit, something like
.../collection/update?commit=true. Just reducing the hard commit
interval will make the wait between stopping indexing and restarting
shorter all by itself if you don't want to issue the manual commit.

Best,
Erick

On Tue, Nov 21, 2017 at 10:34 AM, Hendrik Haddorp
<hendrik.hadd...@gmx.net> wrote:

Hi,

the write.lock issue I see as well when Solr is not been stopped 
gracefully.
The write.lock files are then left in the HDFS as they do not get 
removed

automatically when the client disconnects like a ephemeral node in
ZooKeeper. Unfortunately Solr does also not realize that it should 
be owning
the lock as it is marked in the state stored in ZooKeeper as the 
owner and
is also not willing to retry, which is why you need to restart the 
whole

Solr instance after the cleanup. I added some logic to my Solr start up
script which scans the log files in HDFS and compares that with the 
state in
ZooKeeper and then delete all lock files that belong to the node 
that I'm

starting.

regards,
Hendrik


On 21.11.2017 14:07, Joe Obernberger wrote:
Hi All - we have a system with 45 physical boxes running solr 6.6.1 
using

HDFS as the index.  The current index size is about 31TBytes. With 3x
replication that takes up 93TBytes of disk. Our main collection is 
split
across 100 shards with 3 replicas each.  The issue that we're 
running into
is when restarting the solr6 cluster.  The shards go into recovery 
and start
to utilize nearly all of their network interfaces.  If we start too 
many of
the nodes at once, the shards will go into a recovery, fail, and 
retry loop

and never come up.  The errors are related to HDFS not responding fast
enough and warnings from the DFSClient.  If we stop a node when 
this is

happening, the script will force a stop (180 second timeout) and upon
restart, we have lock files (write.lock) inside of HDFS.

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Hendrik Haddorp
Unfortunately I can not upload my cleanup code but the steps I'm doing 
are quite easy. I wrote it in Java using the HDFS API and Curator for 
ZooKeeper. Steps are:
    - read out the children of /collections in ZK so you know all the 
collection names

    - read /collections//state.json to get the state
    - find the replicas in the state and filter those out that have a 
"node_name" matching your locale node (the node name is basically a 
combination of your host name and the solr port)
    - if the replica data has "dataDir" set then you basically only 
need to add "index/write.lock" to it and you have the lock location
    - if "dataDir" is not set (not really sure why) then you need to 
construct it yourself: //name>/data/index/write.lock

    - if the lock file exist delete it

I believe there is a small race condition in case you use replica auto 
fail over. So I try to keep the time between checking the state in 
ZooKeeper and deleting the lock file as short, like not first determine 
all lock file locations and only then delete them but do that while 
checking the state.


regards,
Hendrik

On 21.11.2017 19:53, Joe Obernberger wrote:
A clever idea.  Normally what we do when we need to do a restart, is 
to halt indexing, and then wait about 30 minutes.  If we do not wait, 
and stop the cluster, the default scripts 180 second timeout is not 
enough and we'll have lock files to clean up.  We use puppet to start 
and stop the nodes, but at this point that is not working well since 
we need to start one node at a time.  With each one taking hours, this 
is a lengthy process!  I'd love to see your script!


This new error is now coming up - see screen shot.  For some reason 
some of the shards have no leader assigned:


http://lovehorsepower.com/SolrClusterErrors.jpg

-Joe


On 11/21/2017 1:34 PM, Hendrik Haddorp wrote:

Hi,

the write.lock issue I see as well when Solr is not been stopped 
gracefully. The write.lock files are then left in the HDFS as they do 
not get removed automatically when the client disconnects like a 
ephemeral node in ZooKeeper. Unfortunately Solr does also not realize 
that it should be owning the lock as it is marked in the state stored 
in ZooKeeper as the owner and is also not willing to retry, which is 
why you need to restart the whole Solr instance after the cleanup. I 
added some logic to my Solr start up script which scans the log files 
in HDFS and compares that with the state in ZooKeeper and then delete 
all lock files that belong to the node that I'm starting.


regards,
Hendrik

On 21.11.2017 14:07, Joe Obernberger wrote:
Hi All - we have a system with 45 physical boxes running solr 6.6.1 
using HDFS as the index. The current index size is about 31TBytes. 
With 3x replication that takes up 93TBytes of disk. Our main 
collection is split across 100 shards with 3 replicas each.  The 
issue that we're running into is when restarting the solr6 cluster.  
The shards go into recovery and start to utilize nearly all of their 
network interfaces.  If we start too many of the nodes at once, the 
shards will go into a recovery, fail, and retry loop and never come 
up.  The errors are related to HDFS not responding fast enough and 
warnings from the DFSClient.  If we stop a node when this is 
happening, the script will force a stop (180 second timeout) and 
upon restart, we have lock files (write.lock) inside of HDFS.


The process at this point is to start one node, find out the lock 
files, wait for it to come up completely (hours), stop it, delete 
the write.lock files, and restart.  Usually this second restart is 
faster, but it still can take 20-60 minutes.


The smaller indexes recover much faster (less than 5 minutes). 
Should we have not used so many replicas with HDFS?  Is there a 
better way we should have built the solr6 cluster?


Thank you for any insight!

-Joe




---
This email has been checked for viruses by AVG.
http://www.avg.com







Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Hendrik Haddorp

Hi,

the write.lock issue I see as well when Solr is not been stopped 
gracefully. The write.lock files are then left in the HDFS as they do 
not get removed automatically when the client disconnects like a 
ephemeral node in ZooKeeper. Unfortunately Solr does also not realize 
that it should be owning the lock as it is marked in the state stored in 
ZooKeeper as the owner and is also not willing to retry, which is why 
you need to restart the whole Solr instance after the cleanup. I added 
some logic to my Solr start up script which scans the log files in HDFS 
and compares that with the state in ZooKeeper and then delete all lock 
files that belong to the node that I'm starting.


regards,
Hendrik

On 21.11.2017 14:07, Joe Obernberger wrote:
Hi All - we have a system with 45 physical boxes running solr 6.6.1 
using HDFS as the index.  The current index size is about 31TBytes. 
With 3x replication that takes up 93TBytes of disk. Our main 
collection is split across 100 shards with 3 replicas each.  The issue 
that we're running into is when restarting the solr6 cluster.  The 
shards go into recovery and start to utilize nearly all of their 
network interfaces.  If we start too many of the nodes at once, the 
shards will go into a recovery, fail, and retry loop and never come 
up.  The errors are related to HDFS not responding fast enough and 
warnings from the DFSClient.  If we stop a node when this is 
happening, the script will force a stop (180 second timeout) and upon 
restart, we have lock files (write.lock) inside of HDFS.


The process at this point is to start one node, find out the lock 
files, wait for it to come up completely (hours), stop it, delete the 
write.lock files, and restart.  Usually this second restart is faster, 
but it still can take 20-60 minutes.


The smaller indexes recover much faster (less than 5 minutes). Should 
we have not used so many replicas with HDFS?  Is there a better way we 
should have built the solr6 cluster?


Thank you for any insight!

-Joe





Re: SolrJ DocCollection is missing config name

2017-11-12 Thread Hendrik Haddorp
An option is actually to do an explicit 
ClusterStatus.getClusterStatus().process(solr, collectionName) request 
and then get the config set name out of the result. This is a bit 
cumbersome but works.


On 12.11.2017 19:54, Hendrik Haddorp wrote:

Hi,

the SolrJ DocCollection object seems to contain all information from 
the cluster status except the name of the config set.

Is that a bug or on purpose?

The reason might be that everything in the DocCollection object 
originates from the state.json while the config set name is stored in 
the collection node directly in ZooKeeper. But so far I can't find a 
way to get the config set name using the CloudSolrClient.


regards,
Hendrik




SolrJ DocCollection is missing config name

2017-11-12 Thread Hendrik Haddorp

Hi,

the SolrJ DocCollection object seems to contain all information from the 
cluster status except the name of the config set.

Is that a bug or on purpose?

The reason might be that everything in the DocCollection object 
originates from the state.json while the config set name is stored in 
the collection node directly in ZooKeeper. But so far I can't find a way 
to get the config set name using the CloudSolrClient.


regards,
Hendrik


Re: solr core replication

2017-10-23 Thread Hendrik Haddorp

Hi Erick,

sorry for the slow reply. You are right, the information is not 
persisted. Once I do a restart there is no information about the 
replication source anymore. That explains why I could not find it 
anywhere persisted ;-) I thought I had tested that last week but must 
have not done so as it worked just fine now.


thanks,
Hendrik

On 20.10.2017 16:39, Erick Erickson wrote:

Does that persist even after you restart Solr on the target cluster?

And that clears up one bit of confusion I had, I didn't know how you
were having each shard on the target cluster use a different master URL
given they all use the same solrconfig file. I was guessing some magic with
system variables, but it turns out you were wy ahead of me and
not configuring the replication in solrconfig at all.

But no, I know of no API level command that works to do what you're asking.
I also don't know where that data is persisted, I'm afraid you'll have to go
code-diving for all the help I can be

Using fetchindex this way in SolrCloud is something of an edge case. It'll
probably be around forever since replication is used as a fall-back when
a replica syncs, but there'll be some bits like this hanging around I'd guess.

Best,
Erick

On Thu, Oct 19, 2017 at 11:55 PM, Hendrik Haddorp
<hendrik.hadd...@gmx.net> wrote:

Hi Erick,

that is actually the call I'm using :-)
If you invoke
http://solr_target_machine:port/solr/core/replication?command=details after
that you can see the replication status. But even after a Solr restart the
call still shows the replication relation and I would like to remove this so
that the core looks "normal" again.

regards,
Hendrik

On 20.10.2017 02:31, Erick Erickson wrote:

Little known trick:

The fetchIndex replication API call can take any parameter you specify
in your config. So you don't have to configure replication at all on
your target collection, just issue the replication API command with
masterUrl, something like:


http://solr_target_machine:port/solr/core/replication?command=fetchindex=http://solr_source_machine:port/solr/core

NOTE, "core" above will be something like collection1_shard1_replica1

During the fetchindex, you won't be able to search on the target
collection although the source will be searchable.

Now, all that said this is just copying stuff. So let's say you've
indexed to your source cluster and set up your target cluster (but
don't index anything to the target or do the replication etc). Now if
you shut down the target cluster and just copy the entire data dir
from each source replica to each target replica then start all the
target Solr instances up you'll be fine.

Best,
Erick

On Thu, Oct 19, 2017 at 1:33 PM, Hendrik Haddorp
<hendrik.hadd...@gmx.net> wrote:

Hi,

I want to transfer a Solr collection from one SolrCloud to another one.
For
that I create a collection in the target cloud using the same config set
as
on the source cloud but with a replication factor of one. After that I'm
using the Solr core API with a "replication?command=fetchindex" command
to
transfer the data. In the last step I'm increasing the replication
factor.
This seems to work fine so far. When I invoke
"replication?command=details"
I can see my replication setup and check if the replication is done. In
the
end I would like to remove this relation again but there does not seem to
be
an API call for that. Given that the replication should be a one time
replication according to the API on
https://lucene.apache.org/solr/guide/6_6/index-replication.html this
should
not be a big problem. It just does not look clean to me to leave this in
the
system. Is there anything I'm missing?

regards,
Hendrik






Re: solr core replication

2017-10-20 Thread Hendrik Haddorp

Hi Erick,

that is actually the call I'm using :-)
If you invoke 
http://solr_target_machine:port/solr/core/replication?command=details 
after that you can see the replication status. But even after a Solr 
restart the call still shows the replication relation and I would like 
to remove this so that the core looks "normal" again.


regards,
Hendrik

On 20.10.2017 02:31, Erick Erickson wrote:

Little known trick:

The fetchIndex replication API call can take any parameter you specify
in your config. So you don't have to configure replication at all on
your target collection, just issue the replication API command with
masterUrl, something like:

http://solr_target_machine:port/solr/core/replication?command=fetchindex=http://solr_source_machine:port/solr/core

NOTE, "core" above will be something like collection1_shard1_replica1

During the fetchindex, you won't be able to search on the target
collection although the source will be searchable.

Now, all that said this is just copying stuff. So let's say you've
indexed to your source cluster and set up your target cluster (but
don't index anything to the target or do the replication etc). Now if
you shut down the target cluster and just copy the entire data dir
from each source replica to each target replica then start all the
target Solr instances up you'll be fine.

Best,
Erick

On Thu, Oct 19, 2017 at 1:33 PM, Hendrik Haddorp
<hendrik.hadd...@gmx.net> wrote:

Hi,

I want to transfer a Solr collection from one SolrCloud to another one. For
that I create a collection in the target cloud using the same config set as
on the source cloud but with a replication factor of one. After that I'm
using the Solr core API with a "replication?command=fetchindex" command to
transfer the data. In the last step I'm increasing the replication factor.
This seems to work fine so far. When I invoke "replication?command=details"
I can see my replication setup and check if the replication is done. In the
end I would like to remove this relation again but there does not seem to be
an API call for that. Given that the replication should be a one time
replication according to the API on
https://lucene.apache.org/solr/guide/6_6/index-replication.html this should
not be a big problem. It just does not look clean to me to leave this in the
system. Is there anything I'm missing?

regards,
Hendrik




solr core replication

2017-10-19 Thread Hendrik Haddorp

Hi,

I want to transfer a Solr collection from one SolrCloud to another one. 
For that I create a collection in the target cloud using the same config 
set as on the source cloud but with a replication factor of one. After 
that I'm using the Solr core API with a "replication?command=fetchindex" 
command to transfer the data. In the last step I'm increasing the 
replication factor. This seems to work fine so far. When I invoke 
"replication?command=details" I can see my replication setup and check 
if the replication is done. In the end I would like to remove this 
relation again but there does not seem to be an API call for that. Given 
that the replication should be a one time replication according to the 
API on https://lucene.apache.org/solr/guide/6_6/index-replication.html 
this should not be a big problem. It just does not look clean to me to 
leave this in the system. Is there anything I'm missing?


regards,
Hendrik


Re: streaming with SolrJ

2017-09-28 Thread Hendrik Haddorp
hm, thanks, but why are all those withFunctionName calls required and 
how did you get to this?


On 28.09.2017 22:01, Susheel Kumar wrote:

I have this snippet with couple of functions e.g. if that helps

---
 TupleStream stream;
 List tuples;
 StreamContext streamContext = new StreamContext();
 SolrClientCache solrClientCache = new SolrClientCache();
 streamContext.setSolrClientCache(solrClientCache);

 StreamFactory factory = new StreamFactory()
  .withCollectionZkHost("gettingstarted", "localhost:2181")
 .withFunctionName("search", CloudSolrStream.class)
   .withFunctionName("select", SelectStream.class)
   .withFunctionName("add", AddEvaluator.class)
   .withFunctionName("if", IfThenElseEvaluator.class)
   .withFunctionName("gt", GreaterThanEvaluator.class)
   .withFunctionName("let", LetStream.class)
   .withFunctionName("get", GetStream.class)
   .withFunctionName("echo", EchoStream.class)
   .withFunctionName("merge", MergeStream.class)
   .withFunctionName("sort", SortStream.class)
   .withFunctionName("tuple", TupStream.class)
   .withFunctionName("rollup",RollupStream.class)
   .withFunctionName("hashJoin", HashJoinStream.class)
   .withFunctionName("complement", ComplementStream.class)
   .withFunctionName("fetch", FetchStream.class)
   .withFunctionName("having",HavingStream.class)
//  .withFunctionName("eq", EqualsEvaluator.class)
   .withFunctionName("count", CountMetric.class)
   .withFunctionName("facet", FacetStream.class)
   .withFunctionName("sum", SumMetric.class)
   .withFunctionName("unique", UniqueStream.class)
   .withFunctionName("uniq", UniqueMetric.class)
   .withFunctionName("innerJoin", InnerJoinStream.class)
   .withFunctionName("intersect", IntersectStream.class)
   .withFunctionName("replace", ReplaceOperation.class)

   ;
 try {
 clause = getClause();
   stream = factory.constructStream(clause);
   stream.setStreamContext(streamContext);
   tuples = getTuples(stream);

   for(Tuple tuple : tuples )
   {
   System.out.println(tuple.getString("id"));
   System.out.println(tuple.getString("business_email_s"));
 

   }

   System.out.println("Total tuples retunred "+tuples.size());


---
private static String getClause() {
String clause = "select(search(gettingstarted,\n" +
"q=*:* NOT personal_email_s:*,\n" +
"fl=\"id,business_email_s\",\n" +
"sort=\"business_email_s asc\"),\n" +
"id,\n" +
"business_email_s,\n" +
"personal_email_s,\n" +
"replace(personal_email_s,null,withField=business_email_s)\n" +
")";
return clause;
}


On Thu, Sep 28, 2017 at 3:35 PM, Hendrik Haddorp <hendrik.hadd...@gmx.net>
wrote:


Hi,

I'm trying to use the streaming API via SolrJ but have some trouble with
the documentation and samples. In the reference guide I found the below
example in http://lucene.apache.org/solr/guide/6_6/streaming-expression
s.html. Problem is that "withStreamFunction" does not seem to exist.
There is "withFunctionName", which would match the arguments but there is
no documentation in the JavaDoc nor is the sample stating why I would need
all those "with" calls if pretty much everything is also in the last
"constructStream" method call. I was planning to retrieve a few fields for
all documents in a collection but have trouble to figure out what is the
correct way to do so. The documentation also uses "/export" and "/search",
with little explanation on the differences. Would really appreciate a
pointer to some simple samples.

The org.apache.solr.client.solrj.io package provides Java classes that
compile streaming expressions into streaming API objects. These classes can
be used to execute streaming expressions from inside a Java application.
For example:

StreamFactory streamFactory = new 
StreamFactory().withCollectionZkHost("collection1",
zkServer.getZkAddress())
 .withStreamFunction("search", CloudSolrStream.class)
 .withStreamFunction("unique", UniqueStream.class)
 .withStreamFunction("top", RankStream.class)
 .withStreamFunction("group", ReducerStream.class)
 .withStreamFunction("parallel", ParallelStream.class);

ParallelStream pstream = (ParallelStream)streamFactory.
constructStream("parallel(collection1, group(search(collection1,
q=\"*:*\", fl=\"id,a_s,a_i,a_f\", sort=\"a_s asc,a_f asc\",
partitionKeys=\"a_s\"), by=\"a_s asc\"), workers=\"2\",
zkHost=\""+zkHost+"\", sort=\"a_s asc\")");

regards,
Hendrik





streaming with SolrJ

2017-09-28 Thread Hendrik Haddorp

Hi,

I'm trying to use the streaming API via SolrJ but have some trouble with 
the documentation and samples. In the reference guide I found the below 
example in 
http://lucene.apache.org/solr/guide/6_6/streaming-expressions.html. 
Problem is that "withStreamFunction" does not seem to exist. There is 
"withFunctionName", which would match the arguments but there is no 
documentation in the JavaDoc nor is the sample stating why I would need 
all those "with" calls if pretty much everything is also in the last 
"constructStream" method call. I was planning to retrieve a few fields 
for all documents in a collection but have trouble to figure out what is 
the correct way to do so. The documentation also uses "/export" and 
"/search", with little explanation on the differences. Would really 
appreciate a pointer to some simple samples.


The org.apache.solr.client.solrj.io package provides Java classes that 
compile streaming expressions into streaming API objects. These classes 
can be used to execute streaming expressions from inside a Java 
application. For example:


StreamFactory streamFactory = new 
StreamFactory().withCollectionZkHost("collection1", zkServer.getZkAddress())

.withStreamFunction("search", CloudSolrStream.class)
.withStreamFunction("unique", UniqueStream.class)
.withStreamFunction("top", RankStream.class)
.withStreamFunction("group", ReducerStream.class)
.withStreamFunction("parallel", ParallelStream.class);

ParallelStream pstream = 
(ParallelStream)streamFactory.constructStream("parallel(collection1, 
group(search(collection1, q=\"*:*\", fl=\"id,a_s,a_i,a_f\", sort=\"a_s 
asc,a_f asc\", partitionKeys=\"a_s\"), by=\"a_s asc\"), workers=\"2\", 
zkHost=\""+zkHost+"\", sort=\"a_s asc\")");


regards,
Hendrik


Re: generate field name in query

2017-09-13 Thread Hendrik Haddorp

You should be able to just use
price_owner_float:[100 TO 200]  OR price_customer_float:[100 TO 200]

If the document doesn't have the field the condition is false.

On 12.09.2017 23:14, xdzgor1 wrote:

Rick Leir-2 wrote

Peter
The common setup is to use copyfield from all your fields into a 'grab
bag' containing everything, and then to search on it alone. Cheers -- Rick

On August 2, 2017 7:31:10 AM EDT, Peter Kirk 
pk@
 wrote:

Hi - is it possible to create a query (or fq) which generates the field
to search on, based on whether or not the document has that field?

Eg. Search for documents with prices in the range 100 - 200, using
either the field "price_owner_float" or "price_customer_float" (if a
document has a field "price_owner_float" then use that, otherwise use
the field "price_customer_float").

This gives a syntax error:
fq=if(exists(price_owner_float),price_owner_float,price_customer_float):[100
TO 200]

Thanks,
Peter

--
Sorry for being brief. Alternate email is rickleir at yahoo dot com


Thanks, but I don't really think a general copy-field is what I want. I want
to specifically search for particular values in named fields.

For example:
if the document has a "field1" then use "field1:[1 TO 100]"; but if there is
no "field1", then check if there is a "field2"; if there is a "field2" then
use "field2:[1 TO 100]; but if there is no "field2", then use "field3:[1 TO
100].

Something like:
?q=*:*=if(exists(field1),field1:[1 TO 100],if(exists(field2),field2:[1 TO
100], field3:[1 TO 100]))


But is this does not work.
Is it even possible?




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html




Re: Solr memory leak

2017-09-10 Thread Hendrik Haddorp
I didn't meant to say that the fix is not in 7.0. I just stated that I 
do not see it listed in the release notes 
(https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310230=12335718).


Thanks for explaining the release process.

regards,
Hendrik

On 10.09.2017 17:32, Erick Erickson wrote:

There will be no 6.7. Once the X+1 version is released, all past fixes
are applied to as minor releases to the last released version of the
previous major release. So now that 7.0 has been cut, there might be a
6.6.2 (6.6.1 was just released) but no 6.7. Current un-released JIRAs
are parked on the 6.x (as opposed to branch_6_6) for convenience. If
anyone steps up to release 6.6.2, they can include ths.

Why do you say this isn't in 7.0? The "Fix Versions" clearly states
so, as does CHANGES.txt for 7.0. The new file is is in the 7.0 branch.


If you need it in 6x you have a couple of options:

1> agitate fo ra 6.6.2 with this included
2> apply the patch yourself and compile it locally

Best,
Erick

On Sun, Sep 10, 2017 at 6:04 AM, Hendrik Haddorp
<hendrik.hadd...@gmx.net> wrote:

Hi,

looks like SOLR-10506 didn't make it into 6.6.1. I do however also not see
it listen in the current release notes for 6.7 nor 7.0:
 https://issues.apache.org/jira/projects/SOLR/versions/12340568
 https://issues.apache.org/jira/projects/SOLR/versions/12335718

Is there any any rough idea already when 6.7 or 7.0 will be released?

thanks,
Hendrik


On 28.08.2017 18:31, Erick Erickson wrote:

Varun Thacker is the RM for Solr 6.6.1, I've pinged him about including
it.

On Mon, Aug 28, 2017 at 8:52 AM, Walter Underwood <wun...@wunderwood.org>
wrote:

That would be a really good reason for a 6.7.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



On Aug 28, 2017, at 8:48 AM, Markus Jelsma <markus.jel...@openindex.io>
wrote:

It is, unfortunately, not committed for 6.7.





-Original message-

From:Markus Jelsma <markus.jel...@openindex.io>
Sent: Monday 28th August 2017 17:46
To: solr-user@lucene.apache.org
Subject: RE: Solr memory leak

See https://issues.apache.org/jira/browse/SOLR-10506
Fixed for 7.0

Markus



-Original message-

From:Hendrik Haddorp <hendrik.hadd...@gmx.net>
Sent: Monday 28th August 2017 17:42
To: solr-user@lucene.apache.org
Subject: Solr memory leak

Hi,

we noticed that triggering collection reloads on many collections has
a
good chance to result in an OOM-Error. To investigate that further I
did
a simple test:
  - Start solr with a 2GB heap and 1GB Metaspace
  - create a trivial collection with a few documents (I used only 2
fields and 100 documents)
  - trigger a collection reload in a loop (I used SolrJ for this)

Using Solr 6.3 the test started to fail after about 250 loops. Solr
6.6
worked better but also failed after 1100 loops.

When looking at the memory usage on the Solr dashboard it looks like
the
space left after GC cycles gets less and less. Then Solr gets very
slow,
as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In
my last run this was actually for the Metaspace. So it looks like more
and more heap and metaspace is being used by just constantly reloading
a
trivial collection.

regards,
Hendrik





Re: Solr memory leak

2017-09-10 Thread Hendrik Haddorp

Hi,

looks like SOLR-10506 didn't make it into 6.6.1. I do however also not 
see it listen in the current release notes for 6.7 nor 7.0:

https://issues.apache.org/jira/projects/SOLR/versions/12340568
https://issues.apache.org/jira/projects/SOLR/versions/12335718

Is there any any rough idea already when 6.7 or 7.0 will be released?

thanks,
Hendrik

On 28.08.2017 18:31, Erick Erickson wrote:

Varun Thacker is the RM for Solr 6.6.1, I've pinged him about including it.

On Mon, Aug 28, 2017 at 8:52 AM, Walter Underwood  wrote:

That would be a really good reason for a 6.7.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



On Aug 28, 2017, at 8:48 AM, Markus Jelsma  wrote:

It is, unfortunately, not committed for 6.7.





-Original message-

From:Markus Jelsma 
Sent: Monday 28th August 2017 17:46
To: solr-user@lucene.apache.org
Subject: RE: Solr memory leak

See https://issues.apache.org/jira/browse/SOLR-10506
Fixed for 7.0

Markus



-Original message-

From:Hendrik Haddorp 
Sent: Monday 28th August 2017 17:42
To: solr-user@lucene.apache.org
Subject: Solr memory leak

Hi,

we noticed that triggering collection reloads on many collections has a
good chance to result in an OOM-Error. To investigate that further I did
a simple test:
 - Start solr with a 2GB heap and 1GB Metaspace
 - create a trivial collection with a few documents (I used only 2
fields and 100 documents)
 - trigger a collection reload in a loop (I used SolrJ for this)

Using Solr 6.3 the test started to fail after about 250 loops. Solr 6.6
worked better but also failed after 1100 loops.

When looking at the memory usage on the Solr dashboard it looks like the
space left after GC cycles gets less and less. Then Solr gets very slow,
as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In
my last run this was actually for the Metaspace. So it looks like more
and more heap and metaspace is being used by just constantly reloading a
trivial collection.

regards,
Hendrik





Re: Solr memory leak

2017-08-30 Thread Hendrik Haddorp
Did you get an answer? Would really be nice to have that in the next 
release.


On 28.08.2017 18:31, Erick Erickson wrote:

Varun Thacker is the RM for Solr 6.6.1, I've pinged him about including it.

On Mon, Aug 28, 2017 at 8:52 AM, Walter Underwood  wrote:

That would be a really good reason for a 6.7.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



On Aug 28, 2017, at 8:48 AM, Markus Jelsma  wrote:

It is, unfortunately, not committed for 6.7.





-Original message-

From:Markus Jelsma 
Sent: Monday 28th August 2017 17:46
To: solr-user@lucene.apache.org
Subject: RE: Solr memory leak

See https://issues.apache.org/jira/browse/SOLR-10506
Fixed for 7.0

Markus



-Original message-

From:Hendrik Haddorp 
Sent: Monday 28th August 2017 17:42
To: solr-user@lucene.apache.org
Subject: Solr memory leak

Hi,

we noticed that triggering collection reloads on many collections has a
good chance to result in an OOM-Error. To investigate that further I did
a simple test:
 - Start solr with a 2GB heap and 1GB Metaspace
 - create a trivial collection with a few documents (I used only 2
fields and 100 documents)
 - trigger a collection reload in a loop (I used SolrJ for this)

Using Solr 6.3 the test started to fail after about 250 loops. Solr 6.6
worked better but also failed after 1100 loops.

When looking at the memory usage on the Solr dashboard it looks like the
space left after GC cycles gets less and less. Then Solr gets very slow,
as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In
my last run this was actually for the Metaspace. So it looks like more
and more heap and metaspace is being used by just constantly reloading a
trivial collection.

regards,
Hendrik





Solr memory leak

2017-08-28 Thread Hendrik Haddorp

Hi,

we noticed that triggering collection reloads on many collections has a 
good chance to result in an OOM-Error. To investigate that further I did 
a simple test:

- Start solr with a 2GB heap and 1GB Metaspace
- create a trivial collection with a few documents (I used only 2 
fields and 100 documents)

- trigger a collection reload in a loop (I used SolrJ for this)

Using Solr 6.3 the test started to fail after about 250 loops. Solr 6.6 
worked better but also failed after 1100 loops.


When looking at the memory usage on the Solr dashboard it looks like the 
space left after GC cycles gets less and less. Then Solr gets very slow, 
as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In 
my last run this was actually for the Metaspace. So it looks like more 
and more heap and metaspace is being used by just constantly reloading a 
trivial collection.


regards,
Hendrik


Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Hendrik Haddorp
It is a known problem: 
https://cwiki.apache.org/confluence/display/CURATOR/TN4


There are multiple JIRAs around this, like the one I pointed to earlier: 
https://issues.apache.org/jira/browse/SOLR-10524

There it states:
This JIRA is to break out that part of the discussion as it might be an 
easy win whereas "eliminating the Overseer queue" would be quite an 
undertaking.


I assume this issue only shows up if you have many cores. There are also 
some config settings that might have an effect but I have not really 
figured out the magic settings. As said Solr 6.6 might also work better.


On 22.08.2017 19:18, Jeff Courtade wrote:

righto,

thanks very much for your help clarifying this. I am not alone :)

I have been looking at this for a few days now.

I am seeing people who have experienced this issue going back to solr
version 4.x.

I am wondering if it is an underlying issue with the way the q is managed.

I would think that it should not be able to be put into a state that is not
recoverable except destructively.

If you have a very active  solr cluster this could cause data loss I am
thinking.






--
Thanks,

Jeff Courtade
M: 240.507.6116

On Tue, Aug 22, 2017 at 1:14 PM, Hendrik Haddorp <hendrik.hadd...@gmx.net>
wrote:


- stop all solr nodes
- start zk with the new jute.maxbuffer setting
- start a zk client, like zkCli, with the changed jute.maxbuffer setting
and check that you can read out the overseer queue
- clear the queue
- restart zk with the normal settings
- slowly start solr

On 22.08.2017 15:27, Jeff Courtade wrote:


I set jute.maxbuffer on the so hosts should this be done to solr as well?

Mine is happening in a severely memory constrained end as well.

Jeff Courtade
M: 240.507.6116

On Aug 22, 2017 8:53 AM, "Hendrik Haddorp" <hendrik.hadd...@gmx.net>
wrote:

We have Solr and ZK running in Docker containers. There is no more then

one Solr/ZK node per host but Solr and ZK node can run on the same host.
So
Solr and ZK are spread out separately.

I have not seen this problem during normal processing just when we
recycle
nodes or when we have nodes fail, which is pretty much always caused by
being out of memory, which again is unfortunately a bit complex in
Docker.
When nodes come up they add quite a few tasks to the overseer queue. I
assume one task for every core. We have about 2000 cores on each node. If
nodes come up too fast the queue might grow to a few thousand entries. At
maybe 1 entries it usually reaches the point of no return and Solr is
just added more tasks then it is able to process. So it's best to pull
the
plug at that point as you will not have to play with jute.maxbuffer to
get
Solr up again.

We are using Solr 6.3. There is some improvements in 6.6:
  https://issues.apache.org/jira/browse/SOLR-10524
  https://issues.apache.org/jira/browse/SOLR-10619

On 22.08.2017 14:41, Jeff Courtade wrote:

Thanks very much.

I will followup when we try this.

Im curious in the env this is happening to you are the zookeeper
servers residing on solr nodes? Are the solr nodes underpowered ram and
or
cpu?

Jeff Courtade
M: 240.507.6116

On Aug 22, 2017 8:30 AM, "Hendrik Haddorp" <hendrik.hadd...@gmx.net>
wrote:

I'm always using a small Java program to delete the nodes directly. I


assume you can also delete the whole node but that is nothing I have
tried
myself.

On 22.08.2017 14:27, Jeff Courtade wrote:

So ...


Using the zkCli.sh i have the jute.maxbuffer setup so I can list it
now.

Can I

 rmr /overseer/queue

Or do i need to delete individual entries?

Will

rmr /overseer/queue/*

work?




Jeff Courtade
M: 240.507.6116

On Aug 22, 2017 8:20 AM, "Hendrik Haddorp" <hendrik.hadd...@gmx.net>
wrote:

When Solr is stopped it did not cause a problem so far.

I cleared the queue also a few times while Solr was still running.

That
also didn't result in a real problem but some replicas might not come
up
again. In those case it helps to either restart the node with the
replicas
that are in state "down" or to remove the failed replica and then
recreate
it. But as said, clearing it when Solr is stopped worked fine so far.

On 22.08.2017 14:03, Jeff Courtade wrote:

How does the cluster react to the overseer q entries disapeering?



Jeff Courtade
M: 240.507.6116

On Aug 22, 2017 8:01 AM, "Hendrik Haddorp" <hendrik.hadd...@gmx.net
wrote:

Hi Jeff,

we ran into that a few times already. We have lots of collections
and


when
nodes get started too fast the overseer queue grows faster then
Solr
can
process it. At some point Solr tries to redo things like leaders
votes
and
adds new tasks to the list, which then gets longer and longer. Once
it
is
too long you can not read out the data anymore but Solr is still
adding
tasks. In case you already reached that point you have to start
ZooKeeper
and the ZooKeeper client with and increased "jute.maxbuffer"
value. I
usually double it 

Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Hendrik Haddorp

- stop all solr nodes
- start zk with the new jute.maxbuffer setting
- start a zk client, like zkCli, with the changed jute.maxbuffer setting 
and check that you can read out the overseer queue

- clear the queue
- restart zk with the normal settings
- slowly start solr

On 22.08.2017 15:27, Jeff Courtade wrote:

I set jute.maxbuffer on the so hosts should this be done to solr as well?

Mine is happening in a severely memory constrained end as well.

Jeff Courtade
M: 240.507.6116

On Aug 22, 2017 8:53 AM, "Hendrik Haddorp" <hendrik.hadd...@gmx.net> wrote:


We have Solr and ZK running in Docker containers. There is no more then
one Solr/ZK node per host but Solr and ZK node can run on the same host. So
Solr and ZK are spread out separately.

I have not seen this problem during normal processing just when we recycle
nodes or when we have nodes fail, which is pretty much always caused by
being out of memory, which again is unfortunately a bit complex in Docker.
When nodes come up they add quite a few tasks to the overseer queue. I
assume one task for every core. We have about 2000 cores on each node. If
nodes come up too fast the queue might grow to a few thousand entries. At
maybe 1 entries it usually reaches the point of no return and Solr is
just added more tasks then it is able to process. So it's best to pull the
plug at that point as you will not have to play with jute.maxbuffer to get
Solr up again.

We are using Solr 6.3. There is some improvements in 6.6:
 https://issues.apache.org/jira/browse/SOLR-10524
 https://issues.apache.org/jira/browse/SOLR-10619

On 22.08.2017 14:41, Jeff Courtade wrote:


Thanks very much.

I will followup when we try this.

Im curious in the env this is happening to you are the zookeeper
servers residing on solr nodes? Are the solr nodes underpowered ram and or
cpu?

Jeff Courtade
M: 240.507.6116

On Aug 22, 2017 8:30 AM, "Hendrik Haddorp" <hendrik.hadd...@gmx.net>
wrote:

I'm always using a small Java program to delete the nodes directly. I

assume you can also delete the whole node but that is nothing I have
tried
myself.

On 22.08.2017 14:27, Jeff Courtade wrote:

So ...

Using the zkCli.sh i have the jute.maxbuffer setup so I can list it now.

Can I

rmr /overseer/queue

Or do i need to delete individual entries?

Will

rmr /overseer/queue/*

work?




Jeff Courtade
M: 240.507.6116

On Aug 22, 2017 8:20 AM, "Hendrik Haddorp" <hendrik.hadd...@gmx.net>
wrote:

When Solr is stopped it did not cause a problem so far.


I cleared the queue also a few times while Solr was still running. That
also didn't result in a real problem but some replicas might not come
up
again. In those case it helps to either restart the node with the
replicas
that are in state "down" or to remove the failed replica and then
recreate
it. But as said, clearing it when Solr is stopped worked fine so far.

On 22.08.2017 14:03, Jeff Courtade wrote:

How does the cluster react to the overseer q entries disapeering?



Jeff Courtade
M: 240.507.6116

On Aug 22, 2017 8:01 AM, "Hendrik Haddorp" <hendrik.hadd...@gmx.net>
wrote:

Hi Jeff,

we ran into that a few times already. We have lots of collections and

when
nodes get started too fast the overseer queue grows faster then Solr
can
process it. At some point Solr tries to redo things like leaders
votes
and
adds new tasks to the list, which then gets longer and longer. Once
it
is
too long you can not read out the data anymore but Solr is still
adding
tasks. In case you already reached that point you have to start
ZooKeeper
and the ZooKeeper client with and increased "jute.maxbuffer" value. I
usually double it until I can read out the queue again. After that I
delete
all entries in the queue and then start the Solr nodes one by one,
like
every 5 minutes.

regards,
Hendrik

On 22.08.2017 13:42, Jeff Courtade wrote:

Hi,

I have an issue with what seems to be a blocked up /overseer/queue

There are 700k + entries.

Solr cloud 6.x

You cannot addreplica or deletereplica the commands time out.

Full stop and start of solr and zookeeper does not clear it.

Is it safe to use the zookeeper supplied zkCli.sh to simple rmr the
/overseer/queue ?


Jeff Courtade
M: 240.507.6116









Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Hendrik Haddorp
We have Solr and ZK running in Docker containers. There is no more then 
one Solr/ZK node per host but Solr and ZK node can run on the same host. 
So Solr and ZK are spread out separately.


I have not seen this problem during normal processing just when we 
recycle nodes or when we have nodes fail, which is pretty much always 
caused by being out of memory, which again is unfortunately a bit 
complex in Docker. When nodes come up they add quite a few tasks to the 
overseer queue. I assume one task for every core. We have about 2000 
cores on each node. If nodes come up too fast the queue might grow to a 
few thousand entries. At maybe 1 entries it usually reaches the 
point of no return and Solr is just added more tasks then it is able to 
process. So it's best to pull the plug at that point as you will not 
have to play with jute.maxbuffer to get Solr up again.


We are using Solr 6.3. There is some improvements in 6.6:
https://issues.apache.org/jira/browse/SOLR-10524
https://issues.apache.org/jira/browse/SOLR-10619

On 22.08.2017 14:41, Jeff Courtade wrote:

Thanks very much.

I will followup when we try this.

Im curious in the env this is happening to you are the zookeeper
servers residing on solr nodes? Are the solr nodes underpowered ram and or
cpu?

Jeff Courtade
M: 240.507.6116

On Aug 22, 2017 8:30 AM, "Hendrik Haddorp" <hendrik.hadd...@gmx.net> wrote:


I'm always using a small Java program to delete the nodes directly. I
assume you can also delete the whole node but that is nothing I have tried
myself.

On 22.08.2017 14:27, Jeff Courtade wrote:


So ...

Using the zkCli.sh i have the jute.maxbuffer setup so I can list it now.

Can I

   rmr /overseer/queue

Or do i need to delete individual entries?

Will

rmr /overseer/queue/*

work?




Jeff Courtade
M: 240.507.6116

On Aug 22, 2017 8:20 AM, "Hendrik Haddorp" <hendrik.hadd...@gmx.net>
wrote:

When Solr is stopped it did not cause a problem so far.

I cleared the queue also a few times while Solr was still running. That
also didn't result in a real problem but some replicas might not come up
again. In those case it helps to either restart the node with the
replicas
that are in state "down" or to remove the failed replica and then
recreate
it. But as said, clearing it when Solr is stopped worked fine so far.

On 22.08.2017 14:03, Jeff Courtade wrote:

How does the cluster react to the overseer q entries disapeering?



Jeff Courtade
M: 240.507.6116

On Aug 22, 2017 8:01 AM, "Hendrik Haddorp" <hendrik.hadd...@gmx.net>
wrote:

Hi Jeff,


we ran into that a few times already. We have lots of collections and
when
nodes get started too fast the overseer queue grows faster then Solr
can
process it. At some point Solr tries to redo things like leaders votes
and
adds new tasks to the list, which then gets longer and longer. Once it
is
too long you can not read out the data anymore but Solr is still adding
tasks. In case you already reached that point you have to start
ZooKeeper
and the ZooKeeper client with and increased "jute.maxbuffer" value. I
usually double it until I can read out the queue again. After that I
delete
all entries in the queue and then start the Solr nodes one by one, like
every 5 minutes.

regards,
Hendrik

On 22.08.2017 13:42, Jeff Courtade wrote:

Hi,


I have an issue with what seems to be a blocked up /overseer/queue

There are 700k + entries.

Solr cloud 6.x

You cannot addreplica or deletereplica the commands time out.

Full stop and start of solr and zookeeper does not clear it.

Is it safe to use the zookeeper supplied zkCli.sh to simple rmr the
/overseer/queue ?


Jeff Courtade
M: 240.507.6116








Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Hendrik Haddorp
I'm always using a small Java program to delete the nodes directly. I 
assume you can also delete the whole node but that is nothing I have 
tried myself.


On 22.08.2017 14:27, Jeff Courtade wrote:

So ...

Using the zkCli.sh i have the jute.maxbuffer setup so I can list it now.

Can I

  rmr /overseer/queue

Or do i need to delete individual entries?

Will

rmr /overseer/queue/*

work?




Jeff Courtade
M: 240.507.6116

On Aug 22, 2017 8:20 AM, "Hendrik Haddorp" <hendrik.hadd...@gmx.net> wrote:


When Solr is stopped it did not cause a problem so far.
I cleared the queue also a few times while Solr was still running. That
also didn't result in a real problem but some replicas might not come up
again. In those case it helps to either restart the node with the replicas
that are in state "down" or to remove the failed replica and then recreate
it. But as said, clearing it when Solr is stopped worked fine so far.

On 22.08.2017 14:03, Jeff Courtade wrote:


How does the cluster react to the overseer q entries disapeering?



Jeff Courtade
M: 240.507.6116

On Aug 22, 2017 8:01 AM, "Hendrik Haddorp" <hendrik.hadd...@gmx.net>
wrote:

Hi Jeff,

we ran into that a few times already. We have lots of collections and
when
nodes get started too fast the overseer queue grows faster then Solr can
process it. At some point Solr tries to redo things like leaders votes
and
adds new tasks to the list, which then gets longer and longer. Once it is
too long you can not read out the data anymore but Solr is still adding
tasks. In case you already reached that point you have to start ZooKeeper
and the ZooKeeper client with and increased "jute.maxbuffer" value. I
usually double it until I can read out the queue again. After that I
delete
all entries in the queue and then start the Solr nodes one by one, like
every 5 minutes.

regards,
Hendrik

On 22.08.2017 13:42, Jeff Courtade wrote:

Hi,

I have an issue with what seems to be a blocked up /overseer/queue

There are 700k + entries.

Solr cloud 6.x

You cannot addreplica or deletereplica the commands time out.

Full stop and start of solr and zookeeper does not clear it.

Is it safe to use the zookeeper supplied zkCli.sh to simple rmr the
/overseer/queue ?


Jeff Courtade
M: 240.507.6116







Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Hendrik Haddorp

When Solr is stopped it did not cause a problem so far.
I cleared the queue also a few times while Solr was still running. That 
also didn't result in a real problem but some replicas might not come up 
again. In those case it helps to either restart the node with the 
replicas that are in state "down" or to remove the failed replica and 
then recreate it. But as said, clearing it when Solr is stopped worked 
fine so far.


On 22.08.2017 14:03, Jeff Courtade wrote:

How does the cluster react to the overseer q entries disapeering?



Jeff Courtade
M: 240.507.6116

On Aug 22, 2017 8:01 AM, "Hendrik Haddorp" <hendrik.hadd...@gmx.net> wrote:


Hi Jeff,

we ran into that a few times already. We have lots of collections and when
nodes get started too fast the overseer queue grows faster then Solr can
process it. At some point Solr tries to redo things like leaders votes and
adds new tasks to the list, which then gets longer and longer. Once it is
too long you can not read out the data anymore but Solr is still adding
tasks. In case you already reached that point you have to start ZooKeeper
and the ZooKeeper client with and increased "jute.maxbuffer" value. I
usually double it until I can read out the queue again. After that I delete
all entries in the queue and then start the Solr nodes one by one, like
every 5 minutes.

regards,
Hendrik

On 22.08.2017 13:42, Jeff Courtade wrote:


Hi,

I have an issue with what seems to be a blocked up /overseer/queue

There are 700k + entries.

Solr cloud 6.x

You cannot addreplica or deletereplica the commands time out.

Full stop and start of solr and zookeeper does not clear it.

Is it safe to use the zookeeper supplied zkCli.sh to simple rmr the
/overseer/queue ?


Jeff Courtade
M: 240.507.6116






Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Hendrik Haddorp

Hi Jeff,

we ran into that a few times already. We have lots of collections and 
when nodes get started too fast the overseer queue grows faster then 
Solr can process it. At some point Solr tries to redo things like 
leaders votes and adds new tasks to the list, which then gets longer and 
longer. Once it is too long you can not read out the data anymore but 
Solr is still adding tasks. In case you already reached that point you 
have to start ZooKeeper and the ZooKeeper client with and increased 
"jute.maxbuffer" value. I usually double it until I can read out the 
queue again. After that I delete all entries in the queue and then start 
the Solr nodes one by one, like every 5 minutes.


regards,
Hendrik

On 22.08.2017 13:42, Jeff Courtade wrote:

Hi,

I have an issue with what seems to be a blocked up /overseer/queue

There are 700k + entries.

Solr cloud 6.x

You cannot addreplica or deletereplica the commands time out.

Full stop and start of solr and zookeeper does not clear it.

Is it safe to use the zookeeper supplied zkCli.sh to simple rmr the
/overseer/queue ?


Jeff Courtade
M: 240.507.6116





Re: atomic updates in conjunction with optimistic concurrency

2017-07-21 Thread Hendrik Haddorp
_version_", 
response.getResults().get(0).get("_version_"));
 docs.add(document);
 updateRequest = new UpdateRequest();
 updateRequest.add(docs);
 client.request(updateRequest, collection);
 updateRequest = new UpdateRequest();
 updateRequest.commit(client, collection);
}

Maybe you can let us know more details how the update been made?

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Jul 21, 2017 at 10:36 PM, Hendrik Haddorp <hendrik.hadd...@gmx.net>
wrote:


Hi,

I can't find anything about this in the Solr logs. On the caller side I
have this:
Error from server at http://x_shard1_replica2: version conflict for
x expected=1573538179623944192 actual=1573546159565176832
org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error
from server at http://x_shard1_replica2: version conflict for x
expected=1573538179623944192 actual=1573546159565176832
 at 
org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:765)
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
shalin - 2016-11-02 19:52:43]
 at 
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1173)
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
shalin - 2016-11-02 19:52:43]
 at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWit
hRetryOnStaleState(CloudSolrClient.java:1062)
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
shalin - 2016-11-02 19:52:43]
 at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:1004)
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
shalin - 2016-11-02 19:52:43]
 at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
shalin - 2016-11-02 19:52:43]
 at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:106)
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
shalin - 2016-11-02 19:52:43]
 at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:71)
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
shalin - 2016-11-02 19:52:43]
 ...
Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at http://x_shard1_replica2: version conflict for
x expected=1573538179623944192 actual=1573546159565176832
 at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:593)
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
shalin - 2016-11-02 19:52:43]
 at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:262)
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
shalin - 2016-11-02 19:52:43]
 at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:251)
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
shalin - 2016-11-02 19:52:43]
 at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:435)
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
shalin - 2016-11-02 19:52:43]
 at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:387)
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
shalin - 2016-11-02 19:52:43]
 at org.apache.solr.client.solrj.impl.CloudSolrClient.lambda$dir
ectUpdate$0(CloudSolrClient.java:742) ~[solr-solrj-6.3.0.jar:6.3.0
a66a44513ee8191e25b477372094bfa846450316 - shalin - 2016-11-02 19:52:43]
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[?:1.8.0_131]
 at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolE
xecutor.lambda$execute$0(ExecutorUtil.java:229)
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
shalin - 2016-11-02 19:52:43]
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
~[?:1.8.0_131]
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
~[?:1.8.0_131]
 at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_131]

The version "1573546159565176832" does not exist. It looks a bit like the
update was first creating a new value and then checks against it.

regards,
Hendrik


On 21.07.2017 18:21, Amrit Sarkar wrote:


Hendrik,

Can you list down the error snippet so that we can refer the code where
exactly that is happening.


Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Jul 21, 2017 at 9:50 PM, Hendrik Haddorp <hendrik.hadd...@gmx.net
wrote:

Hi,

when I try to use an atom

Re: atomic updates in conjunction with optimistic concurrency

2017-07-21 Thread Hendrik Haddorp

Hi,

I can't find anything about this in the Solr logs. On the caller side I 
have this:
Error from server at http://x_shard1_replica2: version conflict for 
x expected=1573538179623944192 actual=1573546159565176832
org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error 
from server at http://x_shard1_replica2: version conflict for x 
expected=1573538179623944192 actual=1573546159565176832
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:765) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1173) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1062) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:1004) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:106) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:71) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]

...
Caused by: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
Error from server at http://x_shard1_replica2: version conflict for 
x expected=1573538179623944192 actual=1573546159565176832
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:593) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:262) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:251) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:435) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:387) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.lambda$directUpdate$0(CloudSolrClient.java:742) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_131]
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[?:1.8.0_131]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
~[?:1.8.0_131]

at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_131]

The version "1573546159565176832" does not exist. It looks a bit like 
the update was first creating a new value and then checks against it.


regards,
Hendrik

On 21.07.2017 18:21, Amrit Sarkar wrote:

Hendrik,

Can you list down the error snippet so that we can refer the code where
exactly that is happening.


Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Jul 21, 2017 at 9:50 PM, Hendrik Haddorp <hendrik.hadd...@gmx.net>
wrote:


Hi,

when I try to use an atomic update in conjunction with optimistic
concurrency Solr sometimes complains that the version I passed in does not
match. The version in my request however match to what is stored and what
the exception states as the actual version does not exist in the collection
at all. Strangely this does only happen sometimes but once it happens for a
collection it seems to stay like that. Any idea why that might happen?

I'm using Solr 6.3 in Cloud mode with SolrJ.

regards,
Hendrik





atomic updates in conjunction with optimistic concurrency

2017-07-21 Thread Hendrik Haddorp

Hi,

when I try to use an atomic update in conjunction with optimistic 
concurrency Solr sometimes complains that the version I passed in does 
not match. The version in my request however match to what is stored and 
what the exception states as the actual version does not exist in the 
collection at all. Strangely this does only happen sometimes but once it 
happens for a collection it seems to stay like that. Any idea why that 
might happen?


I'm using Solr 6.3 in Cloud mode with SolrJ.

regards,
Hendrik


Re: finds all documents without a value for field

2017-07-20 Thread Hendrik Haddorp
If the range query is so much better shouldn't the Solr query parser 
create a range query for a token query that only contains the wildcard? 
For the *:* case it does already contain a special path.


On 20.07.2017 21:00, Shawn Heisey wrote:

On 7/20/2017 7:20 AM, Hendrik Haddorp wrote:

the Solr 6.6. ref guide states that to "finds all documents without a
value for field" you can use:
-field:[* TO *]

While this is true I'm wondering why it is recommended to use a range
query instead of simply:
-field:*

Performance.

A wildcard is expanded to all possible term values for that field.  If
the field has millions of possible terms, then the query object created
at the Lucene level will quite literally have millions of terms in it.
No matter how you approach a query with those characteristics, it's
going to be slow, for both getting the terms list and executing the query.

A full range query might be somewhat slow when there are many possible
values, but it's a lot faster than a wildcard in those cases.

If the field is only used by a handful of documents and has very few
possible values, then it might be faster than a range query ... but this
is not common, so the recommended way to do this is with a range query.

Thanks,
Shawn





Re: finds all documents without a value for field

2017-07-20 Thread Hendrik Haddorp

forgot the link with the statement:
https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html

On 20.07.2017 15:20, Hendrik Haddorp wrote:

Hi,

the Solr 6.6. ref guide states that to "finds all documents without a 
value for field" you can use:

-field:[* TO *]

While this is true I'm wondering why it is recommended to use a range 
query instead of simply:

-field:*

regards,
Hendrik




finds all documents without a value for field

2017-07-20 Thread Hendrik Haddorp

Hi,

the Solr 6.6. ref guide states that to "finds all documents without a 
value for field" you can use:

-field:[* TO *]

While this is true I'm wondering why it is recommended to use a range 
query instead of simply:

-field:*

regards,
Hendrik


query rewriting

2017-03-05 Thread Hendrik Haddorp

Hi,

I would like to dynamically modify a query, for example by replacing a 
field name with a different one. Given how complex the query parsing is 
it does look error prone to duplicate that so I would like to work on 
the Lucene Query object model instead. The subclasses of Query look 
relatively simple and easy to rewrite on the Lucene side but on the Solr 
side this does not seem to be the case. Any suggestions on how this 
could be done?


thanks,
Hendrik


Re: Solr on HDFS: AutoAddReplica does not add a replica

2017-02-22 Thread Hendrik Haddorp

I'm also not really an HDFS expert but I believe it is slightly different:

The HDFS data is replicated, lets say 3 times, between the HDFS data 
nodes but for an HDFS client it looks like one directory and it is 
hidden that the data is replicated. Every client should see the same 
data. Just like every client should see the same data in ZooKeeper 
(every ZK node also has a full replica). So with 2 replicas there should 
only be two disjoint data sets. Thus it should not matter which solr 
node claims the replica and then continues where things were left. Solr 
should only be concerned about the replication between the solr replicas 
but not about the replication between the HDFS data nodes, just as it 
does not have to deal with the replication between the ZK nodes.


Anyhow, for now I would be happy if my patch for SOLR-10092 could get 
included soon as the auto add replica feature does not work without that 
at all for me :-)


On 22.02.2017 16:15, Erick Erickson wrote:

bq: in the none HDFS case that sounds logical but in the HDFS case all
the index data is in the shared HDFS file system

That's not really the point, and it's not quite true. The Solr index
unique _per replica_. So replica1 points to an HDFS directory (that's
triply replicated to be sure). replica2 points to a totally different
set of index files. So with the default replication of 3 your two
replicas will have 6 copies of the index that are totally disjoint in
two sets of three. From Solr's point of view, the fact that HDFS
replicates the data doesn't really alter much.

Autoaddreplica will indeed, to be able to re-use the HDFS data if a
Solr node goes away. But that doesn't change the replication issue I
described.

At least that's my understanding, I admit I'm not an HDFS guy and it
may be out of date.

Erick

On Tue, Feb 21, 2017 at 10:30 PM, Hendrik Haddorp
<hendrik.hadd...@gmx.net> wrote:

Hi Erick,

in the none HDFS case that sounds logical but in the HDFS case all the index
data is in the shared HDFS file system. Even the transaction logs should be
in there. So the node that once had the replica should not really have more
information then any other node, especially if legacyClound is set to false
so having ZooKeeper truth.

regards,
Hendrik

On 22.02.2017 02:28, Erick Erickson wrote:

Hendrik:

bq: Not really sure why one replica needs to be up though.

I didn't write the code so I'm guessing a bit, but consider the
situation where you have no replicas for a shard up and add a new one.
Eventually it could become the leader but there would have been no
chance for it to check if it's version of the index was up to date.
But since it would be the leader, when other replicas for that shard
_do_ come on line they'd replicate the index down from the newly added
replica, possibly using very old data.

FWIW,
Erick

On Tue, Feb 21, 2017 at 1:12 PM, Hendrik Haddorp
<hendrik.hadd...@gmx.net> wrote:

Hi,

I had opened SOLR-10092
(https://issues.apache.org/jira/browse/SOLR-10092)
for this a while ago. I was now able to gt this feature working with a
very
small code change. After a few seconds Solr reassigns the replica to a
different Solr instance as long as one replica is still up. Not really
sure
why one replica needs to be up though. I added the patch based on Solr
6.3
to the bug report. Would be great if it could be merged soon.

regards,
Hendrik

On 19.01.2017 17:08, Hendrik Haddorp wrote:

HDFS is like a shared filesystem so every Solr Cloud instance can access
the data using the same path or URL. The clusterstate.json looks like
this:

"shards":{"shard1":{
  "range":"8000-7fff",
  "state":"active",
  "replicas":{
"core_node1":{
  "core":"test1.collection-0_shard1_replica1",
"dataDir":"hdfs://master...:8000/test1.collection-0/core_node1/data/",
  "base_url":"http://slave3:9000/solr;,
  "node_name":"slave3:9000_solr",
  "state":"active",


"ulogDir":"hdfs://master:8000/test1.collection-0/core_node1/data/tlog"},
"core_node2":{
  "core":"test1.collection-0_shard1_replica2",
"dataDir":"hdfs://master:8000/test1.collection-0/core_node2/data/",
  "base_url":"http://slave2:9000/solr;,
  "node_name":"slave2:9000_solr",
  "state":"active",


"ulogDir":"hdfs://master:8000/test1.collection-0/core_node2/data/tlog",
  "leader":"true"},
"core_node3":{
  "core":"test1.collection-0_shard1_replica3",
"dataDir":"hdfs://master:8000/tes

Re: Solr on HDFS: AutoAddReplica does not add a replica

2017-02-21 Thread Hendrik Haddorp

Hi Erick,

in the none HDFS case that sounds logical but in the HDFS case all the 
index data is in the shared HDFS file system. Even the transaction logs 
should be in there. So the node that once had the replica should not 
really have more information then any other node, especially if 
legacyClound is set to false so having ZooKeeper truth.


regards,
Hendrik

On 22.02.2017 02:28, Erick Erickson wrote:

Hendrik:

bq: Not really sure why one replica needs to be up though.

I didn't write the code so I'm guessing a bit, but consider the
situation where you have no replicas for a shard up and add a new one.
Eventually it could become the leader but there would have been no
chance for it to check if it's version of the index was up to date.
But since it would be the leader, when other replicas for that shard
_do_ come on line they'd replicate the index down from the newly added
replica, possibly using very old data.

FWIW,
Erick

On Tue, Feb 21, 2017 at 1:12 PM, Hendrik Haddorp
<hendrik.hadd...@gmx.net> wrote:

Hi,

I had opened SOLR-10092 (https://issues.apache.org/jira/browse/SOLR-10092)
for this a while ago. I was now able to gt this feature working with a very
small code change. After a few seconds Solr reassigns the replica to a
different Solr instance as long as one replica is still up. Not really sure
why one replica needs to be up though. I added the patch based on Solr 6.3
to the bug report. Would be great if it could be merged soon.

regards,
Hendrik

On 19.01.2017 17:08, Hendrik Haddorp wrote:

HDFS is like a shared filesystem so every Solr Cloud instance can access
the data using the same path or URL. The clusterstate.json looks like this:

"shards":{"shard1":{
 "range":"8000-7fff",
 "state":"active",
 "replicas":{
   "core_node1":{
 "core":"test1.collection-0_shard1_replica1",
"dataDir":"hdfs://master...:8000/test1.collection-0/core_node1/data/",
 "base_url":"http://slave3:9000/solr;,
 "node_name":"slave3:9000_solr",
 "state":"active",

"ulogDir":"hdfs://master:8000/test1.collection-0/core_node1/data/tlog"},
   "core_node2":{
 "core":"test1.collection-0_shard1_replica2",
"dataDir":"hdfs://master:8000/test1.collection-0/core_node2/data/",
 "base_url":"http://slave2:9000/solr;,
 "node_name":"slave2:9000_solr",
 "state":"active",

"ulogDir":"hdfs://master:8000/test1.collection-0/core_node2/data/tlog",
 "leader":"true"},
   "core_node3":{
 "core":"test1.collection-0_shard1_replica3",
"dataDir":"hdfs://master:8000/test1.collection-0/core_node3/data/",
 "base_url":"http://slave4:9005/solr;,
 "node_name":"slave4:9005_solr",
 "state":"active",

"ulogDir":"hdfs://master:8000/test1.collection-0/core_node3/data/tlog"

So every replica is always assigned to one node and this is being stored
in ZK, pretty much the same as for none HDFS setups. Just as the data is not
stored locally but on the network and as the path does not contain any node
information you can of course easily take over the work to a different Solr
node. You should just need to update the owner of the replica in ZK and you
should basically be done, I assume. That's why the documentation states that
an advantage of using HDFS is that a failing node can be replaced by a
different one. The Overseer just has to move the ownership of the replica,
which seems like what the code is trying to do. There just seems to be a bug
in the code so that the core does not get created on the target node.

Each data directory also contains a lock file. The documentation states
that one should use the HdfsLockFactory, which unfortunately can easily lead
to SOLR-8335, which hopefully will be fixed by SOLR-8169. A manual cleanup
is however also easily done but seems to require a node restart to take
effect. But I'm also only recently playing around with all this ;-)

regards,
Hendrik

On 19.01.2017 16:40, Shawn Heisey wrote:

On 1/19/2017 4:09 AM, Hendrik Haddorp wrote:

Given that the data is on HDFS it shouldn't matter if any active
replica is left as the data does not need to get transferred from
another instance but the new core will just take over the existing
data. Thus a replication factor of 1 should also work just in that
case the shard would be down until the new core is up. Anyhow, it
looks like the above call is missing to set t

Re: Solr on HDFS: AutoAddReplica does not add a replica

2017-02-21 Thread Hendrik Haddorp

Hi,

I had opened SOLR-10092 
(https://issues.apache.org/jira/browse/SOLR-10092) for this a while ago. 
I was now able to gt this feature working with a very small code change. 
After a few seconds Solr reassigns the replica to a different Solr 
instance as long as one replica is still up. Not really sure why one 
replica needs to be up though. I added the patch based on Solr 6.3 to 
the bug report. Would be great if it could be merged soon.


regards,
Hendrik

On 19.01.2017 17:08, Hendrik Haddorp wrote:
HDFS is like a shared filesystem so every Solr Cloud instance can 
access the data using the same path or URL. The clusterstate.json 
looks like this:


"shards":{"shard1":{
"range":"8000-7fff",
"state":"active",
"replicas":{
  "core_node1":{
"core":"test1.collection-0_shard1_replica1",
"dataDir":"hdfs://master...:8000/test1.collection-0/core_node1/data/",
"base_url":"http://slave3:9000/solr;,
"node_name":"slave3:9000_solr",
"state":"active",
"ulogDir":"hdfs://master:8000/test1.collection-0/core_node1/data/tlog"}, 


  "core_node2":{
"core":"test1.collection-0_shard1_replica2",
"dataDir":"hdfs://master:8000/test1.collection-0/core_node2/data/",
"base_url":"http://slave2:9000/solr;,
"node_name":"slave2:9000_solr",
"state":"active",
"ulogDir":"hdfs://master:8000/test1.collection-0/core_node2/data/tlog", 


"leader":"true"},
  "core_node3":{
"core":"test1.collection-0_shard1_replica3",
"dataDir":"hdfs://master:8000/test1.collection-0/core_node3/data/",
"base_url":"http://slave4:9005/solr;,
"node_name":"slave4:9005_solr",
"state":"active",
"ulogDir":"hdfs://master:8000/test1.collection-0/core_node3/data/tlog" 



So every replica is always assigned to one node and this is being 
stored in ZK, pretty much the same as for none HDFS setups. Just as 
the data is not stored locally but on the network and as the path does 
not contain any node information you can of course easily take over 
the work to a different Solr node. You should just need to update the 
owner of the replica in ZK and you should basically be done, I assume. 
That's why the documentation states that an advantage of using HDFS is 
that a failing node can be replaced by a different one. The Overseer 
just has to move the ownership of the replica, which seems like what 
the code is trying to do. There just seems to be a bug in the code so 
that the core does not get created on the target node.


Each data directory also contains a lock file. The documentation 
states that one should use the HdfsLockFactory, which unfortunately 
can easily lead to SOLR-8335, which hopefully will be fixed by 
SOLR-8169. A manual cleanup is however also easily done but seems to 
require a node restart to take effect. But I'm also only recently 
playing around with all this ;-)


regards,
Hendrik

On 19.01.2017 16:40, Shawn Heisey wrote:

On 1/19/2017 4:09 AM, Hendrik Haddorp wrote:

Given that the data is on HDFS it shouldn't matter if any active
replica is left as the data does not need to get transferred from
another instance but the new core will just take over the existing
data. Thus a replication factor of 1 should also work just in that
case the shard would be down until the new core is up. Anyhow, it
looks like the above call is missing to set the shard id I guess or
some code is checking wrongly.

I know very little about how SolrCloud interacts with HDFS, so although
I'm reasonably certain about what comes below, I could be wrong.

I have not ever heard of SolrCloud being able to automatically take over
an existing index directory when it creates a replica, or even share
index directories unless the admin fools it into doing so without its
knowledge.  Sharing an index directory for replicas with SolrCloud would
NOT work correctly.  Solr must be able to update all replicas
independently, which means that each of them will lock its index
directory and write to it.

It is my understanding (from reading messages on mailing lists) that
when using HDFS, Solr replicas are all separate and consume additional
disk space, just like on a regular filesystem.

I found the code that generates the "No shard id" exception, but my
knowledge of how the zookeeper code in Solr works is not deep enough to
understand what it means or how to fix it.

Thanks,
Shawn







Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Hendrik Haddorp
Might be that your overseer queue overloaded. Similar to what is 
described here:

https://support.lucidworks.com/hc/en-us/articles/203959903-Bringing-up-downed-Solr-servers-that-don-t-want-to-come-up

If the overseer queue gets too long you get hit by this:
https://github.com/Netflix/curator/wiki/Tech-Note-4

Try to request the overseer status 
(/solr/admin/collections?action=OVERSEERSTATUS). If that fails you 
likely hit that problem. If so you can also not use the ZooKeeper 
command line client anymore. You can now restart all your ZK nodes with 
an increases jute.maxbuffer value. Once ZK is restarted you can use the 
ZK command line client with the same jute.maxbuffer value and check how 
many entries /overseer/queue has in ZK. Normally there should be a few 
entries but if you see thousands then you should delete them. I used a 
few lines of Java code for that, again setting jute.maxbuffer to the 
same value. Once cleaned up restart the Solr nodes one by one and keep 
an eye on the overseer status.


On 02.02.2017 10:52, Ravi Solr wrote:

Following up on my previous email, the intermittent server unavailability
seems to be linked to the interaction between Solr and Zookeeper. Can
somebody help me understand what this error means and how to recover from
it.

2017-02-02 09:44:24.648 ERROR
(recoveryExecutor-3-thread-16-processing-n:xx.xxx.xxx.xxx:1234_solr
x:clicktrack_shard1_replica4 s:shard1 c:clicktrack r:core_node3)
[c:clicktrack s:shard1 r:core_node3 x:clicktrack_shard1_replica4]
o.a.s.c.RecoveryStrategy Error while trying to recover.
core=clicktrack_shard1_replica4:org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /overseer/queue/qn-
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
 at
org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkClient.java:391)
 at
org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkClient.java:388)
 at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
 at
org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:388)
 at
org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:244)
 at org.apache.solr.cloud.ZkController.publish(ZkController.java:1215)
 at org.apache.solr.cloud.ZkController.publish(ZkController.java:1128)
 at org.apache.solr.cloud.ZkController.publish(ZkController.java:1124)
 at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:334)
 at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
 at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
 at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)

Thanks

Ravi Kiran Bhaskar

On Thu, Feb 2, 2017 at 2:27 AM, Ravi Solr  wrote:


Hello,
  Yesterday I upgraded from 6.0.1 to 6.4.0, its been straight 12
hours of debugging spree!! Can somebody kindly help me  out of this misery.

I have a set has 8 single shard collections with 3 replicas. As soon as I
updated the configs and started the servers one of my collection got stuck
with no leader. I have restarted solr to no avail, I also tried to force a
leader via collections API that dint work either. I also see that, from
time to time multiple solr nodes go down all at the same time, only a
restart resolves the issue.

The error snippets are shown below

2017-02-02 01:43:42.785 ERROR (recoveryExecutor-3-thread-6-processing-n:
10.128.159.245:9001_solr x:clicktrack_shard1_replica1 s:shard1
c:clicktrack r:core_node1) [c:clicktrack s:shard1 r:core_node1
x:clicktrack_shard1_replica1] o.a.s.c.RecoveryStrategy Error while trying
to recover. 
core=clicktrack_shard1_replica1:org.apache.solr.common.SolrException:
No registered leader was found after waiting for 4000ms , collection:
clicktrack slice: shard1

solr.log.9:2017-02-02 01:43:41.336 INFO  (zkCallback-4-thread-29-
processing-n:10.128.159.245:9001_solr) [   ] o.a.s.c.c.ZkStateReader A
cluster state change: [WatchedEvent state:SyncConnected
type:NodeDataChanged path:/collections/clicktrack/state.json] for
collection [clicktrack] has occurred - updating... (live nodes size: [1])
solr.log.9:2017-02-02 01:43:42.224 INFO  (zkCallback-4-thread-29-
processing-n:10.128.159.245:9001_solr) [   ] o.a.s.c.c.ZkStateReader A
cluster state change: [WatchedEvent 

Re: How long for autoAddReplica?

2017-02-02 Thread Hendrik Haddorp

Hi,

are you using HDFS? According to the documentation the feature should be 
only available if you are using HDFS. For me it did however also fail on 
that. See the thread "Solr on HDFS: AutoAddReplica does not add a 
replica" from about two weeks ago.


regards,
Hendrik

On 02.02.2017 07:21, Walter Underwood wrote:

I added a new node an shut down a node with a shard replica on it. It has been 
an hour and I don’t see any activity toward making a new replica.

The new node and the one I shut down are both 6.4. The rest of the 16-node 
cluster is 6.2.1.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)







Re: Solr on HDFS: AutoAddReplica does not add a replica

2017-01-19 Thread Hendrik Haddorp
HDFS is like a shared filesystem so every Solr Cloud instance can access 
the data using the same path or URL. The clusterstate.json looks like this:


"shards":{"shard1":{
"range":"8000-7fff",
"state":"active",
"replicas":{
  "core_node1":{
"core":"test1.collection-0_shard1_replica1",
"dataDir":"hdfs://master...:8000/test1.collection-0/core_node1/data/",
"base_url":"http://slave3:9000/solr;,
"node_name":"slave3:9000_solr",
"state":"active",
"ulogDir":"hdfs://master:8000/test1.collection-0/core_node1/data/tlog"},
  "core_node2":{
"core":"test1.collection-0_shard1_replica2",
"dataDir":"hdfs://master:8000/test1.collection-0/core_node2/data/",
"base_url":"http://slave2:9000/solr;,
"node_name":"slave2:9000_solr",
"state":"active",
"ulogDir":"hdfs://master:8000/test1.collection-0/core_node2/data/tlog",
"leader":"true"},
  "core_node3":{
"core":"test1.collection-0_shard1_replica3",
"dataDir":"hdfs://master:8000/test1.collection-0/core_node3/data/",
"base_url":"http://slave4:9005/solr;,
"node_name":"slave4:9005_solr",
"state":"active",
"ulogDir":"hdfs://master:8000/test1.collection-0/core_node3/data/tlog"

So every replica is always assigned to one node and this is being stored 
in ZK, pretty much the same as for none HDFS setups. Just as the data is 
not stored locally but on the network and as the path does not contain 
any node information you can of course easily take over the work to a 
different Solr node. You should just need to update the owner of the 
replica in ZK and you should basically be done, I assume. That's why the 
documentation states that an advantage of using HDFS is that a failing 
node can be replaced by a different one. The Overseer just has to move 
the ownership of the replica, which seems like what the code is trying 
to do. There just seems to be a bug in the code so that the core does 
not get created on the target node.


Each data directory also contains a lock file. The documentation states 
that one should use the HdfsLockFactory, which unfortunately can easily 
lead to SOLR-8335, which hopefully will be fixed by SOLR-8169. A manual 
cleanup is however also easily done but seems to require a node restart 
to take effect. But I'm also only recently playing around with all this ;-)


regards,
Hendrik

On 19.01.2017 16:40, Shawn Heisey wrote:

On 1/19/2017 4:09 AM, Hendrik Haddorp wrote:

Given that the data is on HDFS it shouldn't matter if any active
replica is left as the data does not need to get transferred from
another instance but the new core will just take over the existing
data. Thus a replication factor of 1 should also work just in that
case the shard would be down until the new core is up. Anyhow, it
looks like the above call is missing to set the shard id I guess or
some code is checking wrongly.

I know very little about how SolrCloud interacts with HDFS, so although
I'm reasonably certain about what comes below, I could be wrong.

I have not ever heard of SolrCloud being able to automatically take over
an existing index directory when it creates a replica, or even share
index directories unless the admin fools it into doing so without its
knowledge.  Sharing an index directory for replicas with SolrCloud would
NOT work correctly.  Solr must be able to update all replicas
independently, which means that each of them will lock its index
directory and write to it.

It is my understanding (from reading messages on mailing lists) that
when using HDFS, Solr replicas are all separate and consume additional
disk space, just like on a regular filesystem.

I found the code that generates the "No shard id" exception, but my
knowledge of how the zookeeper code in Solr works is not deep enough to
understand what it means or how to fix it.

Thanks,
Shawn





Re: Solr on HDFS: AutoAddReplica does not add a replica

2017-01-19 Thread Hendrik Haddorp

Hi,
I'm seeing the same issue on Solr 6.3 using HDFS and a replication 
factor of 3, even though I believe a replication factor of 1 should work 
the same. When I stop a Solr instance this is detected and Solr actually 
wants to create a replica on a different instance. The command for that 
does however fail:


o.a.s.c.OverseerAutoReplicaFailoverThread Exception trying to create new 
replica on 
http://...:9000/solr:org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
Error from server at http://...:9000/solr: Error CREATEing SolrCore 
'test2.collection-09_shard1_replica1': Unable to create core 
[test2.collection-09_shard1_replica1] Caused by: No shard id for 
CoreDescriptor[name=test2.collection-09_shard1_replica1;instanceDir=/var/opt/solr/test2.collection-09_shard1_replica1]
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:593)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:262)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:251)
at 
org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at 
org.apache.solr.cloud.OverseerAutoReplicaFailoverThread.createSolrCore(OverseerAutoReplicaFailoverThread.java:456)
at 
org.apache.solr.cloud.OverseerAutoReplicaFailoverThread.lambda$addReplica$0(OverseerAutoReplicaFailoverThread.java:251)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Given that the data is on HDFS it shouldn't matter if any active replica 
is left as the data does not need to get transferred from another 
instance but the new core will just take over the existing data. Thus a 
replication factor of 1 should also work just in that case the shard 
would be down until the new core is up. Anyhow, it looks like the above 
call is missing to set the shard id I guess or some code is checking 
wrongly.


On 14.01.2017 02:44, Shawn Heisey wrote:

On 1/13/2017 5:46 PM, Chetas Joshi wrote:

One of the things I have observed is: if I use the collection API to
create a replica for that shard, it does not complain about the config
which has been set to ReplicationFactor=1. If replication factor was
the issue as suggested by Shawn, shouldn't it complain?

The replicationFactor value is used by exactly two things:  initial
collection creation, and autoAddReplicas.  It will not affect ANY other
command or operation, including ADDREPLICA.  You can create MORE
replicas than replicationFactor indicates, and there will be no error
messages or warnings.

In order to have a replica automatically added, your replicationFactor
must be at least two, and the number of active replicas in the cloud for
a shard must be less than that number.  If that's the case and the
expiration times have been reached without recovery, then Solr will
automatically add replicas until there are at least as many replicas
operational as specified in replicationFactor.


I would also like to mention that I experience some instance dirs
getting deleted and also found this open bug
(https://issues.apache.org/jira/browse/SOLR-8905)

The description on that issue is incomprehensible.  I can't make any
sense out of it.  It mentions the core.properties file, but the error
message shown doesn't talk about the properties file at all.  The error
and issue description seem to have nothing at all to do with the code
lines that were quoted.  Also, it was reported on version 4.10.3 ... but
this is going to be significantly different from current 6.x versions,
and the 4.x versions will NOT be updated with bugfixes.

Thanks,
Shawn





huge amount of overseer queue entries

2017-01-18 Thread Hendrik Haddorp

Hi,

I have a 6.2.1 solr cloud setup with 5 nodes containing close to 3000 
collections having one shard and three replicas each. It looks like when 
nodes crash the overseer queue can go wild on grows until ZooKeeper is 
not working anymore correctly. This looks pretty much like SOLR-5961 
(https://issues.apache.org/jira/browse/SOLR-5961). The only solution 
seems to be to delete the overseer queue entries. If I notice the 
problem too late and ZK is not working correct anymore then setting 
jute.maxbuffer allows to clear the entries again as also described here 
https://cwiki.apache.org/confluence/display/CURATOR/TN4.


Is there some way to prevent the overseer to run amok?

regards,
Hendrik


Re: ClusterStateMutator

2017-01-05 Thread Hendrik Haddorp
The UI warning was quite easy to resolve. I'm currently testing Solr 
with HDFS but for some reason the core ended up on the local storage of 
the node. After a delete and restart the problem was gone.


On 05.01.2017 12:42, Hendrik Haddorp wrote:
Right, I had to do that multiple times already when I restarted nodes 
during collection creation. In such cases I was left with data in the 
clusterstate.json, which at least on 6.2.1, blocked further collection 
creations. Once manually deleted or set to {} collection creation 
worked again.


Setting legacyCloud=false looks good. I don't get anything in 
clusterstate.json anymore and no old collections show up after a node 
restarts. I could also confirm what Shalin said, that state format 2 
is used by default. Only if I explicitly set state format to 1 I see 
data in clusterstate.json during the collection creation. Just the 
Solr admin UI is now showing "SolrCore Initialization Failures" 
pointing to none existing replicas. I assume that happens when Solr 
starts up and finds data for a core that does not exist in ZK anymore. 
How would one clean up this issue? Beside that the some replicas can 
still end up broken if the node restarts in the wrong time. I 
currently have one replica marked as down and one as gone. So far I 
was however always able to manually replace these replicas to resolve 
this state. So in general this looks quite good now. Guess I will 
still need to find a way to make sure that I don't restart a node 
during collection creation :-(


On 05.01.2017 02:33, Erick Erickson wrote:

Let us know how it goes. You'll probably want to remove the _contents_
of clusterstate.json and just leave it as a pair of brackets , i.e. {}
if for no other reason than it's confusing.

Times past the node needed to be there even if empty. Although I just
tried removing it completely on 6x and I was able to start Solr, part
of the startup process recreates it as an empty node, just a pair of
braces.

Best,
Erick

On Wed, Jan 4, 2017 at 1:22 PM, Hendrik Haddorp 
<hendrik.hadd...@gmx.net> wrote:

Hi Erik,

I have actually also seen that behavior already. So will check what
happens when I set that property.
I still believe I'm getting the clusterstate.json set already before 
the

node comes up again. But I will try to verify that further tomorrow.

thanks,
Hendrik

On 04/01/17 22:10, Erick Erickson wrote:

Hendrik:

Historically in 4.x, there was code that would reconstruct the
clusterstate.json code. So you would see "deleted" collections come
back. One scenario was:

- Have a Solr node offline that had a replica for a collection.
- Delete that collection
- Bring the node back
- It would register itself in clusterstate.json.

So my guess is that something like this is going on and you're getting
a clusterstate.json that's reconstructed (and possibly not complete).

You can avoid this by specifying legacyCloud=false clusterprop

Kind of a shot in the dark...

Erick

On Wed, Jan 4, 2017 at 11:12 AM, Hendrik Haddorp
<hendrik.hadd...@gmx.net> wrote:
You are right, the code looks like it. But why did I then see 
collection

data in the clusterstate.json file? If version 1 is not used I would
assume that no data ends up in there. When explicitly setting the 
state
format 2 the system seemed to behave differently. And if the code 
always
uses version 2 shouldn't the default in that line be changed 
accordingly?


On 04/01/17 16:41, Shalin Shekhar Mangar wrote:

Actually the state format defaults to 2 since many releases (all of
6.x at least). This default is enforced in CollectionsHandler much
before the code in ClusterStateMutator is executed.

On Wed, Jan 4, 2017 at 6:16 PM, Hendrik Haddorp 
<hendrik.hadd...@gmx.net> wrote:

Hi,

in
solr-6.3.0/solr/core/src/java/org/apache/solr/cloud/overseer/ClusterStateMutator.java 


there is the following code starting line 107:

//TODO default to 2; but need to debug why 
BasicDistributedZk2Test fails

early on
 String znode = message.getInt(DocCollection.STATE_FORMAT, 
1) == 1 ? null

 : ZkStateReader.getCollectionPath(cName);

Any if that will be changed to default to version 2 anytime soon?

thanks,
Hendrik






Re: ClusterStateMutator

2017-01-05 Thread Hendrik Haddorp
Right, I had to do that multiple times already when I restarted nodes 
during collection creation. In such cases I was left with data in the 
clusterstate.json, which at least on 6.2.1, blocked further collection 
creations. Once manually deleted or set to {} collection creation worked 
again.


Setting legacyCloud=false looks good. I don't get anything in 
clusterstate.json anymore and no old collections show up after a node 
restarts. I could also confirm what Shalin said, that state format 2 is 
used by default. Only if I explicitly set state format to 1 I see data 
in clusterstate.json during the collection creation. Just the Solr admin 
UI is now showing "SolrCore Initialization Failures" pointing to none 
existing replicas. I assume that happens when Solr starts up and finds 
data for a core that does not exist in ZK anymore. How would one clean 
up this issue? Beside that the some replicas can still end up broken if 
the node restarts in the wrong time. I currently have one replica marked 
as down and one as gone. So far I was however always able to manually 
replace these replicas to resolve this state. So in general this looks 
quite good now. Guess I will still need to find a way to make sure that 
I don't restart a node during collection creation :-(


On 05.01.2017 02:33, Erick Erickson wrote:

Let us know how it goes. You'll probably want to remove the _contents_
of clusterstate.json and just leave it as a pair of brackets , i.e. {}
if for no other reason than it's confusing.

Times past the node needed to be there even if empty. Although I just
tried removing it completely on 6x and I was able to start Solr, part
of the startup process recreates it as an empty node, just a pair of
braces.

Best,
Erick

On Wed, Jan 4, 2017 at 1:22 PM, Hendrik Haddorp <hendrik.hadd...@gmx.net> wrote:

Hi Erik,

I have actually also seen that behavior already. So will check what
happens when I set that property.
I still believe I'm getting the clusterstate.json set already before the
node comes up again. But I will try to verify that further tomorrow.

thanks,
Hendrik

On 04/01/17 22:10, Erick Erickson wrote:

Hendrik:

Historically in 4.x, there was code that would reconstruct the
clusterstate.json code. So you would see "deleted" collections come
back. One scenario was:

- Have a Solr node offline that had a replica for a collection.
- Delete that collection
- Bring the node back
- It would register itself in clusterstate.json.

So my guess is that something like this is going on and you're getting
a clusterstate.json that's reconstructed (and possibly not complete).

You can avoid this by specifying legacyCloud=false clusterprop

Kind of a shot in the dark...

Erick

On Wed, Jan 4, 2017 at 11:12 AM, Hendrik Haddorp
<hendrik.hadd...@gmx.net> wrote:

You are right, the code looks like it. But why did I then see collection
data in the clusterstate.json file? If version 1 is not used I would
assume that no data ends up in there. When explicitly setting the state
format 2 the system seemed to behave differently. And if the code always
uses version 2 shouldn't the default in that line be changed accordingly?

On 04/01/17 16:41, Shalin Shekhar Mangar wrote:

Actually the state format defaults to 2 since many releases (all of
6.x at least). This default is enforced in CollectionsHandler much
before the code in ClusterStateMutator is executed.

On Wed, Jan 4, 2017 at 6:16 PM, Hendrik Haddorp <hendrik.hadd...@gmx.net> wrote:

Hi,

in
solr-6.3.0/solr/core/src/java/org/apache/solr/cloud/overseer/ClusterStateMutator.java
there is the following code starting line 107:

//TODO default to 2; but need to debug why BasicDistributedZk2Test fails
early on
 String znode = message.getInt(DocCollection.STATE_FORMAT, 1) == 1 ? null
 : ZkStateReader.getCollectionPath(cName);

Any if that will be changed to default to version 2 anytime soon?

thanks,
Hendrik




Re: ClusterStateMutator

2017-01-04 Thread Hendrik Haddorp
Hi Erik,

I have actually also seen that behavior already. So will check what
happens when I set that property.
I still believe I'm getting the clusterstate.json set already before the
node comes up again. But I will try to verify that further tomorrow.

thanks,
Hendrik

On 04/01/17 22:10, Erick Erickson wrote:
> Hendrik:
>
> Historically in 4.x, there was code that would reconstruct the
> clusterstate.json code. So you would see "deleted" collections come
> back. One scenario was:
>
> - Have a Solr node offline that had a replica for a collection.
> - Delete that collection
> - Bring the node back
> - It would register itself in clusterstate.json.
>
> So my guess is that something like this is going on and you're getting
> a clusterstate.json that's reconstructed (and possibly not complete).
>
> You can avoid this by specifying legacyCloud=false clusterprop
>
> Kind of a shot in the dark...
>
> Erick
>
> On Wed, Jan 4, 2017 at 11:12 AM, Hendrik Haddorp
> <hendrik.hadd...@gmx.net> wrote:
>> You are right, the code looks like it. But why did I then see collection
>> data in the clusterstate.json file? If version 1 is not used I would
>> assume that no data ends up in there. When explicitly setting the state
>> format 2 the system seemed to behave differently. And if the code always
>> uses version 2 shouldn't the default in that line be changed accordingly?
>>
>> On 04/01/17 16:41, Shalin Shekhar Mangar wrote:
>>> Actually the state format defaults to 2 since many releases (all of
>>> 6.x at least). This default is enforced in CollectionsHandler much
>>> before the code in ClusterStateMutator is executed.
>>>
>>> On Wed, Jan 4, 2017 at 6:16 PM, Hendrik Haddorp <hendrik.hadd...@gmx.net> 
>>> wrote:
>>>> Hi,
>>>>
>>>> in
>>>> solr-6.3.0/solr/core/src/java/org/apache/solr/cloud/overseer/ClusterStateMutator.java
>>>> there is the following code starting line 107:
>>>>
>>>> //TODO default to 2; but need to debug why BasicDistributedZk2Test fails
>>>> early on
>>>> String znode = message.getInt(DocCollection.STATE_FORMAT, 1) == 1 ? 
>>>> null
>>>> : ZkStateReader.getCollectionPath(cName);
>>>>
>>>> Any if that will be changed to default to version 2 anytime soon?
>>>>
>>>> thanks,
>>>> Hendrik
>>>



Re: ClusterStateMutator

2017-01-04 Thread Hendrik Haddorp
You are right, the code looks like it. But why did I then see collection
data in the clusterstate.json file? If version 1 is not used I would
assume that no data ends up in there. When explicitly setting the state
format 2 the system seemed to behave differently. And if the code always
uses version 2 shouldn't the default in that line be changed accordingly?

On 04/01/17 16:41, Shalin Shekhar Mangar wrote:
> Actually the state format defaults to 2 since many releases (all of
> 6.x at least). This default is enforced in CollectionsHandler much
> before the code in ClusterStateMutator is executed.
>
> On Wed, Jan 4, 2017 at 6:16 PM, Hendrik Haddorp <hendrik.hadd...@gmx.net> 
> wrote:
>> Hi,
>>
>> in
>> solr-6.3.0/solr/core/src/java/org/apache/solr/cloud/overseer/ClusterStateMutator.java
>> there is the following code starting line 107:
>>
>> //TODO default to 2; but need to debug why BasicDistributedZk2Test fails
>> early on
>> String znode = message.getInt(DocCollection.STATE_FORMAT, 1) == 1 ? null
>> : ZkStateReader.getCollectionPath(cName);
>>
>> Any if that will be changed to default to version 2 anytime soon?
>>
>> thanks,
>> Hendrik
>
>



Re: create collection gets stuck on node restart

2017-01-04 Thread Hendrik Haddorp
Problem is that we would like to run without down times. Rolling updates 
worked fine so far except when creating a collection at the wrong time. 
I just did another test with stateFormat=2. This seems to greatly 
improve the situation. One collection creation got stuck but other 
creations still worked and after a restart of some nodes the stuck 
collection creation also looked ok. For some reason it just resulted in 
two replicas for the same shard getting assigned to the same node even 
though I specified a rule of "shard:*,replica:<2,node:*".


On 03.01.2017 15:34, Shawn Heisey wrote:

On 1/3/2017 2:59 AM, Hendrik Haddorp wrote:

I have a SolrCloud setup with 5 nodes and am creating collections with
a replication factor of 3. If I kill and restart nodes at the "right"
time during the creation process the creation seems to get stuck.
Collection data is left in the clusterstate.json file in ZooKeeper and
no collections can be created anymore until this entry gets removed. I
can reproduce this on Solr 6.2.1 and 6.3, while 6.3 seems to be
somewhat less likely to get stuck. Is Solr supposed to recover from
data being stuck in the clusterstate.json at some point? I had one
instance where it looked like data was removed again but normally the
data does not seem to get cleaned up automatically and just blocks any
further collection creations.

I did not find anything like this in Jira. Just SOLR-7198 sounds a bit
similar even though it is about deleting collections.

Don't restart your nodes at the same time you're trying to do
maintenance of any kind on your collections.  Try to only do maintenance
when they are all working, or you'll get unexpected results.

The most recent development goal is make it so that collection deletion
can be done even if the creation was partial.  The idea is that if
something goes wrong, you can delete the bad collection and then be free
to try to create it again.  I see that you've started another thread
about deletion not fully eliminating everything in HDFS.  That does
sound like a bug.  I have no experience with HDFS at all, so I can't be
helpful with that.

Thanks,
Shawn





ClusterStateMutator

2017-01-04 Thread Hendrik Haddorp

Hi,

in 
solr-6.3.0/solr/core/src/java/org/apache/solr/cloud/overseer/ClusterStateMutator.java 
there is the following code starting line 107:


//TODO default to 2; but need to debug why BasicDistributedZk2Test fails 
early on
String znode = message.getInt(DocCollection.STATE_FORMAT, 1) == 1 ? 
null

: ZkStateReader.getCollectionPath(cName);

Any if that will be changed to default to version 2 anytime soon?

thanks,
Hendrik


HDFS support maturity

2017-01-03 Thread Hendrik Haddorp

Hi,

is the HDFS support in Solr 6.3 considered production ready?
Any idea how many setups might be using this?

thanks,
Hendrik


deleting a collection leaves empty directories in an HDFS setup

2017-01-03 Thread Hendrik Haddorp

Hi,

playing around with Solr 6.3 and HDFS I noticed that after deleting a 
collection the directories for the Solr cores are left in HDFS. There is 
no date left in them but still this doesn't look clean to me.


regards,
Hendrik


create collection gets stuck on node restart

2017-01-03 Thread Hendrik Haddorp

Hi,

I have a SolrCloud setup with 5 nodes and am creating collections with a 
replication factor of 3. If I kill and restart nodes at the "right" time 
during the creation process the creation seems to get stuck. Collection 
data is left in the clusterstate.json file in ZooKeeper and no 
collections can be created anymore until this entry gets removed. I can 
reproduce this on Solr 6.2.1 and 6.3, while 6.3 seems to be somewhat 
less likely to get stuck. Is Solr supposed to recover from data being 
stuck in the clusterstate.json at some point? I had one instance where 
it looked like data was removed again but normally the data does not 
seem to get cleaned up automatically and just blocks any further 
collection creations.


I did not find anything like this in Jira. Just SOLR-7198 sounds a bit 
similar even though it is about deleting collections.


regards,
Hendrik


Re: Soft commit and reading data just after the commit

2016-12-19 Thread Hendrik Haddorp

Hi,

the SolrJ API has this method: SolrClient.commit(String collection, 
boolean waitFlush, boolean waitSearcher, boolean softCommit).
My assumption so far was that when you set waitSearcher to true that the 
method call only returns once a search would find the new data, which 
sounds what you want. I used this already and it seemed to work just fine.


regards,
Hendrik

On 19.12.2016 04:09, Lasitha Wattaladeniya wrote:

Hi all,

Thanks for your replies,

@dorian : the requirement is,  we are showing a list of entries on a page.
For each user there's a read / unread flag.  The data for listing is
fetched from solr. And you can see the entry was previously read or not. So
when a user views an entry by clicking.  We are updating the database flag
to READ and use real time indexing to update solr entry.  So when the user
close the full view of the entry and go back to entry listing page,  the
data fetched from solr should be updated to READ. That's the use case we
are trying to fix.

@eric : thanks for the lengthy reply.  So let's say I increase the
autosoftcommit time out to may be 100 ms.  In that case do I have to wait
much that time from client side before calling search ?.  What's the
correct way of achieving this?

Regards,
Lasitha

On 18 Dec 2016 23:52, "Erick Erickson"  wrote:


1 ms autocommit is far too frequent. And it's not
helping you anyway.

There is some lag between when a commit happens
and when the docs are really available. The sequence is:
1> commit (soft or hard-with-opensearcher=true doesn't matter).
2> a new searcher is opened and autowarming starts
3> until the new searcher is opened, queries continue to be served by
the old searcher
4> the new searcher is fully opened
5> _new_ requests are served by the new searcher.
6> the last request is finished by the old searcher and it's closed.

So what's probably happening is that you send docs and then send a
query and Solr is still in step <3>. You can look at your admin UI
pluginst/stats page or your log to see how long it takes for a
searcher to open and adjust your expectations accordingly.

If you want to fetch only the document (not try to get it by a
search), Real Time Get is designed to insure that you always get the
most recent copy whether it's searchable or not.

All that said, Solr wasn't designed for autocommits that are that
frequent. That's why the documentation talks about _Near_ Real Time.
You may need to adjust your expectations.

Best,
Erick

On Sun, Dec 18, 2016 at 6:49 AM, Dorian Hoxha 
wrote:

There's a very high probability that you're using the wrong tool for the
job if you need 1ms softCommit time. Especially when you always need it

(ex

there are apps where you need commit-after-insert very rarely).

So explain what you're using it for ?

On Sun, Dec 18, 2016 at 3:38 PM, Lasitha Wattaladeniya <

watt...@gmail.com>

wrote:


Hi Furkan,

Thanks for the links. I had read the first one but not the second one. I
did read it after you sent. So in my current solrconfig.xml settings

below

are the configurations,


${solr.autoSoftCommit.maxTime:1}
  



15000
false
  

The problem i'm facing is, just after adding the documents to solr using
solrj, when I retrieve data from solr I am not getting the updated

results.

This happens time to time. Most of the time I get the correct data but

in

some occasions I get wrong results. so as you suggest, what the best
practice to use here ? , should I wait 1 mili second before calling for
updated results ?

Regards,
Lasitha

Lasitha Wattaladeniya
Software Engineer

Mobile : +6593896893
Blog : techreadme.blogspot.com

On Sun, Dec 18, 2016 at 8:46 PM, Furkan KAMACI 
wrote:


Hi Lasitha,

First of all, did you check these:

https://cwiki.apache.org/confluence/display/solr/Near+

Real+Time+Searching

https://lucidworks.com/blog/2013/08/23/understanding-
transaction-logs-softcommit-and-commit-in-sorlcloud/

after that, if you cannot adjust your configuration you can give more
information and we can find a solution.

Kind Regards,
Furkan KAMACI

On Sun, Dec 18, 2016 at 2:28 PM, Lasitha Wattaladeniya <

watt...@gmail.com>

wrote:


Hi furkan,

Thanks for your reply, it is generally a query heavy system. We are

using

realtime indexing for editing the available data

Regards,
Lasitha

Lasitha Wattaladeniya
Software Engineer

Mobile : +6593896893 <+65%209389%206893>
Blog : techreadme.blogspot.com

On Sun, Dec 18, 2016 at 8:12 PM, Furkan KAMACI <

furkankam...@gmail.com>

wrote:


Hi Lasitha,

What is your indexing / querying requirements. Do you have an index
heavy/light  - query heavy/light system?

Kind Regards,
Furkan KAMACI

On Sun, Dec 18, 2016 at 11:35 AM, Lasitha Wattaladeniya <
watt...@gmail.com>
wrote:


Hello devs,

I'm here with another problem i'm facing. I'm trying to do a

commit

(soft

commit) through solrj and just after the commit, retrieve the data

from

solr (requirement is to get 

  1   2   >