[IMPORTANT] Apache Solr TLP Update - Solr User email list migration

2021-02-23 Thread Anshum Gupta
Hi Solr Users,

As part of setting up Apache Solr as a Top Level Project, we’re migrating
the existing solr-user@lucene.apache.org mailing list to
us...@solr.apache.org.

All existing subscriptions, and conversations will be migrated to the new
list but if you have any mail client filters, please fix them accordingly.

The migration has been requested and ASF Infra is working with the
Lucene/Solr PMC for this[1].

We will update the list once the migration is completed.

- Anshum Gupta
On behalf of the Apache Solr PMC

[1] - https://issues.apache.org/jira/browse/INFRA-21443


Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-18 Thread Anshum Gupta
Hi everyone,

I’d like to inform everyone that the newly formed Apache Solr PMC nominated
and elected Jan Høydahl for the position of the Solr PMC Chair and Vice
President. This decision was approved by the board in its February 2021
meeting.

Congratulations Jan!

-- 
Anshum Gupta


[ANNOUNCE] Apache Solr TLP Created

2021-02-18 Thread Anshum Gupta
Hi everyone,

On behalf of the Apache Lucene PMC, and the newly formed Apache Solr PMC,
I’d like to inform folks that the ASF board has approved the resolution to
create the Solr TLP (Top Level Project).

We are currently working on the next steps but would like to assure the
community that they can continue to expect critical bug fixes for releases
previously made under the Apache Lucene project.

We will send another update as the mailing lists and website are set up for
the Solr project.

-Anshum
On behalf of the Apache Lucene and Solr PMC


Re: Solr Slack Workspace

2021-02-05 Thread Anshum Gupta
Hey Ishan,

Thanks for doing this. Is this the ASF Slack space or something else?


On Tue, Feb 2, 2021 at 2:04 AM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> Hi all,
> I've created an invite link for the Slack workspace:
> https://s.apache.org/solr-slack.
> Please test it out. I'll send a broader notification once this is tested
> out to be working well.
> Thanks and regards,
> Ishan
>
> On Thu, Jan 28, 2021 at 12:26 AM Justin Sweeney <
> justin.sweene...@gmail.com>
> wrote:
>
> > Thanks, I joined the Relevance Slack:
> > https://opensourceconnections.com/slack, I definitely think a dedicated
> > Solr workspace would also be good allowing for channels to get involved
> > with development as well as user based questions.
> >
> > It does seem like slack has made it increasingly difficult to create open
> > workspaces and not force someone to approve or only allow specific email
> > domains. Has anyone tried to do that recently? I tried for an hour or so
> > last weekend and it seemed to not be very straightforward anymore.
> >
> > On Tue, Jan 26, 2021 at 12:57 PM Houston Putman  >
> > wrote:
> >
> > > There is https://solr-dev.slack.com
> > >
> > > It's not really used, but it's there and we can open it up for people
> to
> > > join and start using.
> > >
> > > On Tue, Jan 26, 2021 at 5:38 AM Ishan Chattopadhyaya <
> > > ichattopadhy...@gmail.com> wrote:
> > >
> > > > Thanks ufuk. I'll take a look.
> > > >
> > > > On Tue, 26 Jan, 2021, 4:05 pm ufuk yılmaz,
>  > >
> > > > wrote:
> > > >
> > > > > It’s asking for a searchscale.com email address?
> > > > >
> > > > > Sent from Mail for Windows 10
> > > > >
> > > > > From: Ishan Chattopadhyaya
> > > > > Sent: 26 January 2021 13:33
> > > > > To: solr-user
> > > > > Subject: Re: Solr Slack Workspace
> > > > >
> > > > > There is a Slack backed by official IRC support. Please see
> > > > >
> https://lucene.472066.n3.nabble.com/Solr-Users-Slack-td4466856.html
> > > for
> > > > > details on how to join it.
> > > > >
> > > > > On Tue, 19 Jan, 2021, 2:54 pm Charlie Hull, <
> > > > > ch...@opensourceconnections.com>
> > > > > wrote:
> > > > >
> > > > > > Relevance Slack is open to anyone working on search & relevance -
> > > #solr
> > > > > is
> > > > > > only one of the channels, there's lots more! Hope to see you
> there.
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > > Charlie
> > > > > > https://opensourceconnections.com/slack
> > > > > >
> > > > > >
> > > > > > On 16/01/2021 02:18, matthew sporleder wrote:
> > > > > > > IRC has kind of died off,
> > > > > > > https://lucene.apache.org/solr/community.html has a slack
> > > mentioned,
> > > > > > > I'm on https://opensourceconnections.com/slack after taking
> > their
> > > > solr
> > > > > > > training class and assume it's mostly open to solr community.
> > > > > > >
> > > > > > > On Fri, Jan 15, 2021 at 8:10 PM Justin Sweeney
> > > > > > >  wrote:
> > > > > > >> Hi all,
> > > > > > >>
> > > > > > >> I did some googling and didn't find anything, but is there a
> > Slack
> > > > > > >> workspace for Solr? I think this could be useful to expand
> > > > interaction
> > > > > > >> within the community of Solr users and connect people solving
> > > > similar
> > > > > > >> problems.
> > > > > > >>
> > > > > > >> I'd be happy to get this setup if it does not exist already.
> > > > > > >>
> > > > > > >> Justin
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Charlie Hull - Managing Consultant at OpenSource Connections
> > Limited
> > > > > > 
> > > > > > Founding member of The Search Network <
> > https://thesearchnetwork.com/
> > > >
> > > > > > and co-author of Searching the Enterprise
> > > > > > <https://opensourceconnections.com/about-us/books-resources/>
> > > > > > tel/fax: +44 (0)8700 118334
> > > > > > mobile: +44 (0)7767 825828
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>


-- 
Anshum Gupta


Re: CPU and memory circuit breaker documentation issues

2020-12-18 Thread Anshum Gupta
Hi Walter,

Thanks for taking this up.

You can file a PR for the documentation change too as our docs are now a
part of the repo. Here's where you can find the docs:
https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide


On Fri, Dec 18, 2020 at 9:26 AM Walter Underwood 
wrote:

> Looking at the code, the CPU circuit breaker is unusable.
>
> This actually does use Unix load average
> (operatingSystemMXBean.getSystemLoadAverage()). That is a terrible idea.
> Interpreting the load average requires knowing the number of CPUs on a
> system. If I have 16 CPUs, I would probably set the limit at 16, with one
> process waiting for each CPU.
>
> Unfortunately, this implementation limits the thresholds to 0.5 to 0.95,
> because the implementer thought they were getting a CPU usage value, I
> guess. So the whole thing doesn’t work right.
>
> I’ll file a bug and submit a patch to use
> OperatingSystemMXBean.getSystemCPULoad(). How do I fix the documentation?
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Dec 16, 2020, at 10:41 AM, Walter Underwood 
> wrote:
> >
> > In https://lucene.apache.org/solr/guide/8_7/circuit-breakers.html <
> https://lucene.apache.org/solr/guide/8_7/circuit-breakers.html>
> >
> > URL to Wikipedia is broken, but that doesn’t matter, because that
> article is about a different metric. The Unix “load average” is the length
> of the run queue, the number of processes or threads waiting to run. That
> can go much, much higher than 1.0. In a high load system, I’ve seen it at
> 2X the number of CPUs or higher.
> >
> > Remove that link, it is misleading.
> >
> > The page should list the JMX metrics that are used for this. I’m
> guessing this uses OperatingSystemMXBean.getSystemCPULoad(). That metric
> goes from 0.0 to 1.0.
> >
> >
> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html
> <
> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html
> >
> >
> > I can see where the “load average” and “getSystemCPULoad” names cause
> confusion, but this should be correct in the documents.
> >
> > Which metric is used for the memory threshold? My best guess is that the
> percentage is calculated from the MemoryUsage object returned by
> MemoryMXBean.getHeapMemoryUsage().
> >
> >
> https://docs.oracle.com/javase/7/docs/api/java/lang/management/MemoryMXBean.html
> <
> https://docs.oracle.com/javase/7/docs/api/java/lang/management/MemoryMXBean.html
> >
> >
> https://docs.oracle.com/javase/7/docs/api/java/lang/management/MemoryUsage.html
> <
> https://docs.oracle.com/javase/7/docs/api/java/lang/management/MemoryUsage.html
> >
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org <mailto:wun...@wunderwood.org>
> > http://observer.wunderwood.org/  (my blog)
> >
>
>

-- 
Anshum Gupta


Re: Performance issues with CursorMark

2020-10-26 Thread Anshum Gupta
Hey Markus,

What are you sorting on? Do you have docValues enabled on the sort field ?

On Mon, Oct 26, 2020 at 5:36 AM Markus Jelsma 
wrote:

> Hello,
>
> We have been using a simple Python tool for a long time that eases
> movement of data between Solr collections, it uses CursorMark to fetch
> small or large pieces of data. Recently it stopped working when moving data
> from a production collection to my local machine for testing, the Solr
> nodes began to run OOM.
>
> I added 500M to the 3G heap and now it works again, but slow (240docs/s)
> and costing 3G of the entire heap just to move 32k docs out of 76m total.
>
> Solr 8.6.0 is running with two shards (1 leader+1 replica), each shard has
> 38m docs almost no deletions (0.4%) taking up ~10.6g disk space. The
> documents are very small, they are logs of various interactions of users
> with our main text search engine.
>
> I monitored all four nodes with VisualVM during the transfer, all four
> went up to 3g heap consumption very quickly. After the transfer it took a
> while for two nodes to (forcefully) release the no longer for the transfer
> needed heap space. The two other nodes, now, 17 minutes later, still think
> they have to hang on to their heap consumption. When i start the same
> transfer again, the nodes that already have high memory consumption just
> seem to reuse that, not consuming additional heap. At least the second time
> it went 920docs/s. While we are used to transfer these tiny documents at
> light speed of multiple thousands per second.
>
> What is going on? We do not need additional heap, Solr is clearly not
> asking for more and GC activity is minimal. Why did it become so slow?
> Regular queries on the collection are still going fast, but CursorMarking
> even through a tiny portion is taking time and memory.
>
> Many thanks,
> Markus
>


-- 
Anshum Gupta


ApacheCon at Home 2020 starts tomorrow!

2020-09-28 Thread Anshum Gupta
Hey everyone!

ApacheCon at Home 2020 starts tomorrow. The event is 100% virtual, and free
to register. What’s even better is that this year we have reintroduced the
Lucene/Solr/Search track at ApacheCon.

With 2 full days of sessions covering various Lucene, Solr, and Search, I
hope you are able to find some time to attend the sessions and learn
something new and interesting.

There are also various other tracks that span the 3 days of the conference.
The conference starts in just a few hours for our community in Asia and
tomorrow morning for the Americas and Europe. Check out the complete
schedule in the link below.

Here are a few resources you may find useful if you plan to attend
ApacheCon at Home.

ApacheCon website - https://www.apachecon.com/acna2020/index.html
Registration - https://hopin.to/events/apachecon-home
Slack - http://s.apache.org/apachecon-slack
Search Track - https://www.apachecon.com/acah2020/tracks/search.html

See you at ApacheCon.

-- 
Anshum Gupta


Re: [ANNOUNCE] Apache Solr 8.6.0 released

2020-07-16 Thread Anshum Gupta
t;>> the
> > >>>> development. We envision that using packages for these components
> via
> > >>>> package manager will actually make it easier for users to use such
> > >>> features.
> > >>>>
> > >>>> Regards,
> > >>>>
> > >>>> Ishan Chattopadhyaya
> > >>>>
> > >>>> (On behalf of the Apache Lucene/Solr PMC)
> > >>>>
> > >>>> [0] -
> > >>>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/SOLR/Community+supported+packages+for+Solr
> > >>>>
> > >>>> On Wed, Jul 15, 2020 at 2:30 PM Bruno Roustant <
> > >> bruno.roust...@gmail.com
> > >>>>
> > >>>> wrote:
> > >>>>
> > >>>>> The Lucene PMC is pleased to announce the release of Apache Solr
> > >> 8.6.0.
> > >>>>>
> > >>>>>
> > >>>>> Solr is the popular, blazing fast, open source NoSQL search
> platform
> > >>> from
> > >>>>> the Apache Lucene project. Its major features include powerful
> > >> full-text
> > >>>>> search, hit highlighting, faceted search, dynamic clustering,
> > database
> > >>>>> integration, rich document handling, and geospatial search. Solr is
> > >>> highly
> > >>>>> scalable, providing fault tolerant distributed search and indexing,
> > >> and
> > >>>>> powers the search and navigation features of many of the world's
> > >> largest
> > >>>>> internet sites.
> > >>>>>
> > >>>>>
> > >>>>> Solr 8.6.0 is available for immediate download at:
> > >>>>>
> > >>>>>
> > >>>>>  <https://lucene.apache.org/solr/downloads.html>
> > >>>>>
> > >>>>>
> > >>>>> ### Solr 8.6.0 Release Highlights:
> > >>>>>
> > >>>>>
> > >>>>> * Cross-Collection Join Queries: Join queries can now work
> > >>>>> cross-collection, even when shared or when spanning nodes.
> > >>>>>
> > >>>>> * Search: Performance improvement for some types of queries when
> > >> exact
> > >>>>> hit count isn't needed by using BlockMax WAND algorithm.
> > >>>>>
> > >>>>> * Streaming Expression: Percentiles and standard deviation
> > >> aggregations
> > >>>>> added to stats, facet and time series.  Streaming expressions added
> > to
> > >>>>> /export handler.  Drill Streaming Expression for efficient and
> > >> accurate
> > >>>>> high cardinality aggregation.
> > >>>>>
> > >>>>> * Package manager: Support for cluster (CoreContainer) level
> plugins.
> > >>>>>
> > >>>>> * Health Check: HealthCheckHandler can now require that all cores
> are
> > >>>>> healthy before returning OK.
> > >>>>>
> > >>>>> * Zookeeper read API: A read API at /api/cluster/zk/* to fetch raw
> ZK
> > >>>>> data and view contents of a ZK directory.
> > >>>>>
> > >>>>> * Admin UI: New panel with security info in admin UI's dashboard.
> > >>>>>
> > >>>>> * Query DSL: Support for {param:ref} and {bool: {excludeTags:""}}
> > >>>>>
> > >>>>> * Ref Guide: Major redesign of Solr's documentation.
> > >>>>>
> > >>>>>
> > >>>>> Please read CHANGES.txt for a full list of new features and
> changes:
> > >>>>>
> > >>>>>
> > >>>>>  <https://lucene.apache.org/solr/8_6_0/changes/Changes.html>
> > >>>>>
> > >>>>>
> > >>>>> Solr 8.6.0 also includes features, optimizations  and bugfixes in
> the
> > >>>>> corresponding Apache Lucene release:
> > >>>>>
> > >>>>>
> > >>>>>  <https://lucene.apache.org/core/8_6_0/changes/Changes.html>
> > >>>>>
> > >>>>>
> > >>>>> Note: The Apache Software Foundation uses an extensive mirroring
> > >> network
> > >>>>> for
> > >>>>>
> > >>>>> distributing releases. It is possible that the mirror you are using
> > >> may
> > >>>>> not have
> > >>>>>
> > >>>>> replicated the release yet. If that is the case, please try another
> > >>> mirror.
> > >>>>>
> > >>>>> This also applies to Maven access.
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
>


-- 
Anshum Gupta


Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Anshum Gupta
Hi everyone,

Moving a conversation that was happening on the PMC list to the public
forum. Most of the following is just me recapping the conversation that has
happened so far.

Some members of the community have been discussing getting rid of the
master/slave nomenclature from Solr.

While this may require a non-trivial effort, a general consensus so far
seems to be to start this process and switch over incrementally, if a
single change ends up being too big.

There have been a lot of suggestions around what the new nomenclature might
look like, a few people don’t want to overlap the naming here with what
already exists in SolrCloud i.e. leader/follower.

Primary/Replica was an option that was suggested based on what other
vendors are moving towards based on Wikipedia:
https://en.wikipedia.org/wiki/Master/slave_(technology)
, however there were concerns around the use of “replica” as that denotes a
very specific concept in SolrCloud. Current terminology clearly
differentiates the use of the traditional replication model from SolrCloud
and reusing the names would make it difficult for that to happen.

There were similar concerns around using Leader/follower.

Let’s continue this conversation here while making sure that we converge
without much bike-shedding.

-Anshum


Re: Solr process getting killed suddenly

2019-08-21 Thread Anshum Gupta
Hi Adriano,

Can you provide more information around what are you doing? Answers to
questions like the following would be very useful for anyone who might try
to help you here:

1. What version of Solr are you using?
2. Is this vanilla, or are you using something custom w.r.t. the code as
well as config.
3. How are you running Solr? Do you see any logs at all, if so, can you
share those?

These are just some of the questions you can provide the answer to. Feel
free to add to these.

Also, is there a specific reason for you to run w/ a 25G heap ?

On Wed, Aug 21, 2019 at 3:52 PM Adriano Rogério de O. Carolino de Melo <
carol...@gmail.com> wrote:

> Hi, does anybody know why Solr Java process is terminated with no reason?
> The OOM script does not run. In the server logs does not show anything.
> Solr is running with 25g of Java heap and only using 20%.
>
> --
> *Adriano Melo*
> Tel.: (83) 98875-1868
>


-- 
Anshum Gupta


Re: CursorMarks and 'end of results'

2018-06-19 Thread Anshum Gupta
I might have been wrong there. Having an explicit check for the # results 
returned vs rows requested, would allow you to avoid the last request that 
would otherwise come back with 0 results. That check isn’t automatically done 
within Solr.

 Anshum


> On Jun 19, 2018, at 2:39 PM, Anshum Gupta  wrote:
> 
> Hi David,
> 
> The cursormark would be the same if you get back fewer than the max records 
> requested and so you should exit, as per the documentation.
> 
> I think the documentation says just what you are suggesting, but if you think 
> it could be improved, feel free to put up a patch.
> 
> 
>  Anshum
> 
> 
>> On Jun 18, 2018, at 2:09 AM, David Frese > <mailto:david.fr...@active-group.de>> wrote:
>> 
>> Hi List,
>> 
>> the documentation of 'cursorMarks' recommends to fetch until a query returns 
>> the cursorMark that was passed in to a request.
>> 
>> But that always requires an additional request at the end, so I wonder if I 
>> can stop already, if a request returns less results than requested (num 
>> rows). There won't be new documents added during the search in my use case, 
>> so could there every be a non-empty 'page' after a non-full 'page'?
>> 
>> Thanks very much.
>> 
>> --
>> David Frese
>> +49 7071 70896 75
>> 
>> Active Group GmbH
>> Hechinger Str. 12/1, 72072 Tübingen
>> Registergericht: Amtsgericht Stuttgart, HRB 224404
>> Geschäftsführer: Dr. Michael Sperber
> 



signature.asc
Description: Message signed with OpenPGP


Re: CursorMarks and 'end of results'

2018-06-19 Thread Anshum Gupta
Hi David,

The cursormark would be the same if you get back fewer than the max records 
requested and so you should exit, as per the documentation.

I think the documentation says just what you are suggesting, but if you think 
it could be improved, feel free to put up a patch.


 Anshum


> On Jun 18, 2018, at 2:09 AM, David Frese  wrote:
> 
> Hi List,
> 
> the documentation of 'cursorMarks' recommends to fetch until a query returns 
> the cursorMark that was passed in to a request.
> 
> But that always requires an additional request at the end, so I wonder if I 
> can stop already, if a request returns less results than requested (num 
> rows). There won't be new documents added during the search in my use case, 
> so could there every be a non-empty 'page' after a non-full 'page'?
> 
> Thanks very much.
> 
> --
> David Frese
> +49 7071 70896 75
> 
> Active Group GmbH
> Hechinger Str. 12/1, 72072 Tübingen
> Registergericht: Amtsgericht Stuttgart, HRB 224404
> Geschäftsführer: Dr. Michael Sperber



signature.asc
Description: Message signed with OpenPGP


Re: MoreLikeThis in Solr 7.3.1

2018-06-19 Thread Anshum Gupta
That explains it :)

I assume you did make those changes on disk and did not upload the updated 
configset to zookeeper.

SolrCloud instances use the configset from zk, so all changed files would have 
to be uploaded to zk.

You can re-uplaod the configset using the zkcli.sh script that comes with Solr 
(or some other utility) : 
https://lucene.apache.org/solr/guide/7_3/command-line-utilities.html#using-solr-s-zookeeper-cli
 
<https://lucene.apache.org/solr/guide/7_3/command-line-utilities.html#using-solr-s-zookeeper-cli>

You can also use this script: 
https://lucene.apache.org/solr/guide/7_3/using-zookeeper-to-manage-configuration-files.html#uploading-configuration-files-using-bin-solr-or-solrj
 
<https://lucene.apache.org/solr/guide/7_3/using-zookeeper-to-manage-configuration-files.html#uploading-configuration-files-using-bin-solr-or-solrj>

Here’s the config set API that can also be used to accomplish the same: 
https://lucene.apache.org/solr/guide/7_3/configsets-api.html#configsets-api-entry-points
 
<https://lucene.apache.org/solr/guide/7_3/configsets-api.html#configsets-api-entry-points>

Whatever mechanism you choose to upload the updated config, you should be able 
to see the latest config @ the Solr admin UI (assuming you have access to that) 
by cloud > tree > configs > 


 Anshum


> On Jun 19, 2018, at 2:08 PM, Monique Monteiro  
> wrote:
> 
> I reloaded the collection with the command:
> 
> http://localhost:8983/solr/admin/collections?action=RELOAD=documentos_ce
> 
> But stil the same problem...
> 
> On Tue, Jun 19, 2018 at 4:48 PM Monique Monteiro 
> wrote:
> 
>> Hi Anshum,
>> 
>> I'm using SolrCloud, but both instances are on the same Solr installation
>> (it's just for test purposes), so I suppose they share configuration in
>> solr-7.3.1/server/solr/configsets/_default/conf/solrconfig.xml.
>> 
>> So should I recreate the collection ?
>> 
>> Thanks,
>> Monique
>> 
>> On Tue, Jun 19, 2018 at 4:41 PM Anshum Gupta  wrote:
>> 
>>> Hi Monique,
>>> 
>>> Is this standalone Solr or SolrCloud ? If it is cloud, then you’d have to
>>> make sure that you uploaded the right config and collection should also be
>>> reloaded if you enabled it after creating the collection.
>>> 
>>> Also, did you check the MLT Query parser that does the same thing but
>>> doesn’t require registering of the handler etc. You can find it’s
>>> documentation here:
>>> https://lucene.apache.org/solr/guide/7_3/other-parsers.html#more-like-this-query-parser
>>> 
>>> * *Anshum
>>> 
>>> 
>>> On Jun 19, 2018, at 11:00 AM, Monique Monteiro 
>>> wrote:
>>> 
>>> Hi all,
>>> 
>>> I'm trying to access /mlt in Solr, but the index returns HTTP 404 error.
>>> 
>>> I've already configured the following:
>>> 
>>> 
>>>  - /solr-7.3.1/server/solr/configsets/_default/conf/solrconfig.xml:
>>> 
>>> *>> path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse,/mlt">*
>>> **
>>> *  _text_*
>>> **
>>> *  *
>>> 
>>> AND
>>> 
>>> **
>>> **
>>> *list *
>>> * *
>>> *  *
>>> 
>>> But none of this made "http://localhost:8983/solr/*>> name>*/mlt?q=*:*
>>> return anything other than 404.
>>> 
>>> Has anyone any idea about what may be happening?
>>> 
>>> Thanks in advance,
>>> 
>>> --
>>> Monique Monteiro
>>> 
>>> 
>>> 
>> 
>> --
>> Monique Monteiro
>> Blog: http://moniquelouise.spaces.live.com/
>> Twitter: http://twitter.com/monilouise
>> 
> 
> 
> --
> Monique Monteiro
> Blog: http://moniquelouise.spaces.live.com/
> Twitter: http://twitter.com/monilouise



signature.asc
Description: Message signed with OpenPGP


Re: MoreLikeThis in Solr 7.3.1

2018-06-19 Thread Anshum Gupta
Hi Monique,

Is this standalone Solr or SolrCloud ? If it is cloud, then you’d have to make 
sure that you uploaded the right config and collection should also be reloaded 
if you enabled it after creating the collection.

Also, did you check the MLT Query parser that does the same thing but doesn’t 
require registering of the handler etc. You can find it’s documentation here: 
https://lucene.apache.org/solr/guide/7_3/other-parsers.html#more-like-this-query-parser
 


 Anshum


> On Jun 19, 2018, at 11:00 AM, Monique Monteiro  
> wrote:
> 
> Hi all,
> 
> I'm trying to access /mlt in Solr, but the index returns HTTP 404 error.
> 
> I've already configured the following:
> 
> 
>   - /solr-7.3.1/server/solr/configsets/_default/conf/solrconfig.xml:
> 
>  * path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse,/mlt">*
> **
> *  _text_*
> **
> *  *
> 
> AND
> 
>  **
> **
> *list *
> * *
> *  *
> 
> But none of this made "http://localhost:8983/solr/**/mlt?q=*:*
> return anything other than 404.
> 
> Has anyone any idea about what may be happening?
> 
> Thanks in advance,
> 
> --
> Monique Monteiro



signature.asc
Description: Message signed with OpenPGP


Re: [nesting] Any way to return the whole hierarchical structure when doing Block Join queries?

2018-03-14 Thread Anshum Gupta
Hi Jan,

The way I remember it was done (or at least we did it) is by storing the
depth information as a field in the document using an update request
processor and using a custom transformer to reconstruct the original
multi-level document from it.

Also, this was a reasonably long time ago, so things might have changed
since then.

Anshum

On Thu, Mar 24, 2016 at 12:53 PM Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> I think you cal already kick tires and contribute a test case into
> https://issues.apache.org/jira/browse/SOLR-8208 that's already reachable
> there I believe, but I still working on core design.
>
> On Thu, Mar 24, 2016 at 10:02 PM, Alisa Z.  wrote:
>
> >  Hi all,
> >
> > I apologize for duplicating my previous message:
> > Solr 5.3:  anything similar to ChildDocTransformerFactory  that does not
> > flatten the hierarchical structure?
> >
> > However, it is still an open and interesting question:
> >
> > Following the example from  https://dzone.com/articles/using-solr-49-new
> > , let's say we are given multiple-level nested structure:
> >
> > 
> > 1
> > I am the parent
> > PARENT
> > 
> > 1.1
> > I am the 1st child
> > CHILD
> > 
> > 
> > 1.2
> > I am the 2nd child
> > CHILD
> > 
> > 1.2.1
> > I am a grandchildren
> > GRANDCHILD
> > 
> > 
> > 
> >
> >
> > Querying
> > q={!parent which="cat:PARENT"}name:(I am +child)=id,name,[child
> > parentFilter=cat:PARENT]
> >
> > will return flattened structure, where cat:CHILD and cat:GRANDCHILD
> > documents end up on the same level:
> > 
> > 1
> > I am the parent
> > PARENT
> > 
> > 1.1
> > I am the 1st child
> > CHILD
> > 
> > 
> > 1.2
> > I am the 2nd child
> > CHILD
> > 
> > 
> > 1.2.1
> > I am a grandchildren
> > GRANDCHILD
> > 
> >  Indeed, the JAVAdocs for ChildDocTransformerFactory say: "This
> > transformer returns all descendants of each parent document in a flat
> list
> > nested inside the parent document".
> >
> > Yet is there any way to preserve the hierarchy in the response? I really
> > need to find the way to preserve the structure in the response.
> >
> > Thank you in advance!
> >
> > --
> > Alisa Zhila
> > --
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>


Re: Negative Core Node Numbers

2018-01-04 Thread Anshum Gupta
Hi Chris,

The core node numbers should be cleared out when the collection is deleted. Is 
that something you see consistently ?

P.S: I just tried creating a collection with 1 shard and 200 replicas and saw 
the core node numbers as expected. On deleting and recreating the collection, I 
saw that the counter was reset. Just to be clear, I tried this on master.

-Anshum



> On Jan 4, 2018, at 12:16 PM, Chris Ulicny  wrote:
> 
> Hi,
> 
> In 7.1, how does solr determine the numbers that are assigned to the
> replicas? I'm familiar with the earlier naming conventions from 6.3, but I
> wanted to know if there was supposed to be any connection between the
> "_n##" suffix and the number assigned to the "core_node##" name since they
> don't seem to follow the old convention. As an example node from
> clusterstatus for a testcollection with replication factor 2.
> 
> "core_node91":{
>"core":"testcollection_shard22_replica_n84",
>"base_url":"http://host:8080/solr;,
>"node_name":"host:8080_solr",
>"state":"active",
>"type":"NRT",
>"leader":"true"}
> 
> Along the same lines, when creating the testcollection with 200 shards and
> replication factor of 2, I am also getting nodes that have negative numbers
> assigned to them which looks a lot like an int overflow issue. From the
> cluster status:
> 
>  "shard157":{
>"range":"47ae-48f4",
>"state":"active",
>"replicas":{
>  "core_node1675945628":{
>"core":"testcollection _shard157_replica_n-1174535610",
>"base_url":"http://host1:8080/solr;,
>"node_name":"host1:8080_solr",
>"state":"active",
>"type":"NRT"},
>  "core_node1642259614":{
>"core":"testcollection _shard157_replica_n-1208090040",
>"base_url":"http://host2:8080/solr;,
>"node_name":"host2:8080_solr",
>"state":"active",
>"type":"NRT",
>"leader":"true"}}}
> 
> This keeps happening even when the collection is successfully deleted (no
> directories or files left on disk), the entire cluster is shutdown, and the
> zookeeper chroot path cleared out of all content. The only thing that
> happened prior to this cycle was a single failed collection creation which
> seemed to clean itself up properly, after which everything was shutdown and
> cleaned from zookeeper as well.
> 
> Is there something else that is keeping track of those values that wasn't
> cleared out? Or is this now the expected behavior for the numerical
> assignments to replicas?
> 
> Thanks,
> Chris



signature.asc
Description: Message signed with OpenPGP


Re: Protect a collection to be deleted

2017-12-13 Thread Anshum Gupta
From what I remember, you can set a custom permission for a specific user to be 
able to delete a collection, or not allow anyone to delete a specific 
collection.

Check out the “user defined permissions” section here: 
https://lucidworks.com/2015/08/17/securing-solr-basic-auth-permission-rules/ 
<https://lucidworks.com/2015/08/17/securing-solr-basic-auth-permission-rules/>

-Anshum



> On Dec 13, 2017, at 7:20 AM, Shawn Heisey <apa...@elyograg.org> wrote:
> 
> On 12/12/2017 1:23 PM, Anshum Gupta wrote:
>> You might want to explore Rule based authorization in Solr and stop
>> non-admin users from deleting collections etc. Here’s the link to the
>> documentation: 
>> https://lucene.apache.org/solr/guide/6_6/rule-based-authorization-plugin.html
> 
> Because I've never used the authentication plugins, I have to ask: What
> kind of granularity does this offer?  Can it protect individual
> collections from being deleted, while allowing others to be deleted?
> When I read the documentation, I see something saying that the
> permission affects ALL collections, so I suspect that kind of
> granularity is not possible.
> 
> If authorization can be extended to allow per-collection permissions,
> that is one way to handle the use case, if the admin is already using
> authentication on their Solr instances.  I don't use authentication, and
> it would be quite painful for my ecosystem if I were to turn it on, so I
> would want to have something else available to protect collections from
> API actions.
> 
> Thanks,
> Shawn
> 



signature.asc
Description: Message signed with OpenPGP


Re: Protect a collection to be deleted

2017-12-12 Thread Anshum Gupta
You might want to explore Rule based authorization in Solr and stop non-admin 
users from deleting collections etc. Here’s the link to the documentation: 
https://lucene.apache.org/solr/guide/6_6/rule-based-authorization-plugin.html 


-Anshum



> On Dec 12, 2017, at 9:27 AM, Yago Riveiro  wrote:
> 
> Hi,
> 
> Is it possible in Solr protect a collection to be deleted through a
> property?
> 
> Regards
> 
> 
> 
> 
> -
> Best regards
> 
> /Yago
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



signature.asc
Description: Message signed with OpenPGP


Re: Need help detecting Relatedness in documents

2017-10-26 Thread Anshum Gupta
I would suggest you look at the mlt query parser. That allows you to find 
documents similar to a particular documents, and also allows for specifying the 
field to use for similarity purposes.

https://lucene.apache.org/solr/guide/7_0/other-parsers.html#more-like-this-query-parser
 


-Anshum



> On Oct 26, 2017, at 1:16 AM, Atita Arora  wrote:
> 
> Hi ,
> 
> We're working with a productr where the idea is to present the users the
> related documents in particular timeseries.
> 
> For an overview think about this as an application which picks up top
> trending blogposts "topics" which are picked and ingested from various
> social sites.
> Further , when you look into the topic from the trending list it shows the
> related topics which happen to happen on the blogposts.
> So to mark a related topic they should have occured on a same blogpost , to
> add , more are these number of occurences , more would be the relatedness
> factor.
> 
> Complexity is the related topics change on the user defined date spread ,
> which means if x & y were top most related topics in the blogposts made in
> last 30 days ,
> there is an equal possibility that x could be more related to z if the user
> would have wanted to see related topics in last 60 days.
> So the number of days are user defined and they impact the related topics.
> 
> For now every blogpost goes in the index as a seperate document and the
> topic extraction happens alongside indexing which extracts the topics from
> the blogposts and stores them in a different collection.
> For this we have lot of duplicates on the index too , for e.g. a topicname
> search  "football" has around 80K documents , all of them are
> topicname="football".
> 
> I wonder if someone can help me :
> 1. How to structure the document in such a way the queries could be more
> performant
> 2. Suggest me as to how can we detect the RELATED topics.
> 
> Any help on this would be highly appreciated.
> 
> Thanks in advance.
> 
> Atita



signature.asc
Description: Message signed with OpenPGP


Re: [ANNOUNCE] Apache Solr 7.0.0 released

2017-09-20 Thread Anshum Gupta
It’s strange but something seems to have stripped off all the formatting from 
the announce mail. Here’s a plain text version of the same and hope this is 
more readable.


20 September 2017, Apache Solr™ 7.0.0 available

Solr is the popular, blazing fast, open source NoSQL search platform from the 
Apache Lucene project. Its major features include powerful full-text search, 
hit highlighting, faceted search, dynamic clustering, database integration, 
rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly 
scalable, providing fault tolerant distributed search and indexing, and powers 
the search and navigation features of many of the world's largest internet 
sites. 

Solr 7.0.0 is available for immediate download at: 
http://lucene.apache.org/solr/mirrors-solr-latest-redir.html 
<http://lucene.apache.org/solr/mirrors-solr-latest-redir.html>

See http://lucene.apache.org/solr/7_0_0/changes/Changes.html 
<http://lucene.apache.org/solr/7_0_0/changes/Changes.html> for a full list of 
details. 

  * Replica Types - Solr 7 supports different replica types, which handle 
updates differently. In addition to pure NRT operation where all replicas build 
an index and keep a replication log, you can now also add so called PULL 
replicas, achieving the read-speed optimized benefits of a master/slave setup 
while at the same time keeping index redundancy. 

  * Auto-scaling. Solr can now allocate new replicas to nodes using a new auto 
scaling policy framework. This framework will in future releases enable Solr to 
move shards around based on load, disk etc. 

  * Indented JSON is now the default response format for all APIs, pass wt=xml 
and/or indent=off to use the previous unindented XML format. 

  * The JSON Facet API now supports two-phase facet refinement to ensure 
accurate counts and statistics for facet buckets returned in distributed mode. 

  * Streaming Expressions adds a new statistical programming syntax for the 
statistical analysis of sql queries, random samples, time series and graph 
result sets. 

  * Analytics Component version 2.0, which now supports distributed 
collections, expressions over multivalued fields, a new JSON request language, 
and more. 

  * The new v2 API, exposed at /api/ and also supported via SolrJ, is now the 
preferred API, but /solr/ continues to work. 

  * A new '_default' configset is used if no config is specified at collection 
creation. The data-driven functionality of this configset indexes strings as 
analyzed text while at the same time copying to a '*_str' field suitable for 
faceting. 

  * Solr 7 is tested with and verified to support Java 9. 

Being a major release, Solr 7 removes many deprecated APIs, changes various 
parameter defaults and behavior. Some changes may require a re-index of your 
content. You are thus encouraged to thoroughly read the "Upgrade Notes" at 
http://lucene.apache.org/solr/7_0_0/changes/Changes.html 
<http://lucene.apache.org/solr/7_0_0/changes/Changes.html> or in the 
CHANGES.txt file accompanying the release. 

Solr 7.0.0 also includes many other new features as well as numerous 
optimizations and bugfixes of the corresponding Apache Lucene release. 

Please report any feedback to the mailing lists 
(http://lucene.apache.org/solr/discussion.html 
<http://lucene.apache.org/solr/discussion.html>) 

Note: The Apache Software Foundation uses an extensive mirroring network for 
distributing releases. It is possible that the mirror you are using may not 
have replicated the release yet. If that is the case, please try another 
mirror. This also goes for Maven access.

-Anshum



> On Sep 20, 2017, at 12:09 PM, Anshum Gupta <ansh...@apple.com> wrote:
> 
> 20 September 2017, Apache Solr™ 7.0.0 available
> 
> Solr is the popular, blazing fast, open source NoSQL search platform from the 
> Apache Lucene project. Its major features include powerful full-text search, 
> hit highlighting, faceted search, dynamic clustering, database integration, 
> rich document (e.g., Word, PDF) handling, and geospatial search. Solr is 
> highly scalable, providing fault tolerant distributed search and indexing, 
> and powers the search and navigation features of many of the world's largest 
> internet sites. 
> 
> Solr 7.0.0 is available for immediate download at: 
> http://lucene.apache.org/solr/mirrors-solr-latest-redir.html 
> <http://lucene.apache.org/solr/mirrors-solr-latest-redir.html>
> See http://lucene.apache.org/solr/7_0_0/changes/Changes.html 
> <http://lucene.apache.org/solr/7_0_0/changes/Changes.html> for a full list of 
> details. 
> 
> Replica Types - Solr 7 supports different replica types, which handle updates 
> differently. In addition to pure NRT operation where all replicas build an 
> index and keep a replication log, you can now also add so called PULL 
> replicas, achieving the read-speed opt

[ANNOUNCE] Apache Solr 7.0.0 released

2017-09-20 Thread Anshum Gupta
20 September 2017, Apache Solr™ 7.0.0 available

Solr is the popular, blazing fast, open source NoSQL search platform from the 
Apache Lucene project. Its major features include powerful full-text search, 
hit highlighting, faceted search, dynamic clustering, database integration, 
rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly 
scalable, providing fault tolerant distributed search and indexing, and powers 
the search and navigation features of many of the world's largest internet 
sites. 

Solr 7.0.0 is available for immediate download at: 
http://lucene.apache.org/solr/mirrors-solr-latest-redir.html 
<http://lucene.apache.org/solr/mirrors-solr-latest-redir.html>
See http://lucene.apache.org/solr/7_0_0/changes/Changes.html 
<http://lucene.apache.org/solr/7_0_0/changes/Changes.html> for a full list of 
details. 

Replica Types - Solr 7 supports different replica types, which handle updates 
differently. In addition to pure NRT operation where all replicas build an 
index and keep a replication log, you can now also add so called PULL replicas, 
achieving the read-speed optimized benefits of a master/slave setup while at 
the same time keeping index redundancy. 
Auto-scaling. Solr can now allocate new replicas to nodes using a new auto 
scaling policy framework. This framework will in future releases enable Solr to 
move shards around based on load, disk etc. 
Indented JSON is now the default response format for all APIs, pass wt=xml 
and/or indent=off to use the previous unindented XML format. 
The JSON Facet API now supports two-phase facet refinement to ensure accurate 
counts and statistics for facet buckets returned in distributed mode. 
Streaming Expressions adds a new statistical programming syntax for the 
statistical analysis of sql queries, random samples, time series and graph 
result sets. 
Analytics Component version 2.0, which now supports distributed collections, 
expressions over multivalued fields, a new JSON request language, and more. 
The new v2 API, exposed at /api/ and also supported via SolrJ, is now the 
preferred API, but /solr/ continues to work. 
A new '_default' configset is used if no config is specified at collection 
creation. The data-driven functionality of this configset indexes strings as 
analyzed text while at the same time copying to a '*_str' field suitable for 
faceting. 
Solr 7 is tested with and verified to support Java 9. 
Being a major release, Solr 7 removes many deprecated APIs, changes various 
parameter defaults and behavior. Some changes may require a re-index of your 
content. You are thus encouraged to thoroughly read the "Upgrade Notes" at 
http://lucene.apache.org/solr/7_0_0/changes/Changes.html 
<http://lucene.apache.org/solr/7_0_0/changes/Changes.html> or in the 
CHANGES.txt file accompanying the release. 

Solr 7.0.0 also includes many other new features as well as numerous 
optimizations and bugfixes of the corresponding Apache Lucene release. 

Please report any feedback to the mailing lists 
(http://lucene.apache.org/solr/discussion.html 
<http://lucene.apache.org/solr/discussion.html>) 

Note: The Apache Software Foundation uses an extensive mirroring network for 
distributing releases. It is possible that the mirror you are using may not 
have replicated the release yet. If that is the case, please try another 
mirror. This also goes for Maven access.


Anshum Gupta





Re: Unable to create core [collection] Caused by: null

2017-07-26 Thread Anshum Gupta
Hi Lucas,

It would be super useful if you provided more information with the question. A 
few things you might want to include are - version of Solr, how did you start 
it, stack trace from the log etc.


-Anshum



> On Jul 25, 2017, at 4:21 PM, Lucas Pelegrino  wrote:
> 
> Hey guys.
> 
> Trying to make solr work here, but I'm getting this error from this command:
> 
> $ ./solr create -c products -d /Users/lucaswxp/reduza-solr/products/conf/
> 
> Error CREATEing SolrCore 'products': Unable to create core [products]
> Caused by: null
> 
> I'm posting my solrconf.xml, schema.xml and data-config.xml here:
> https://pastebin.com/fnYK9pSJ
> 
> The debug from log solr: https://pastebin.com/kVLMvBwZ
> 
> Not sure what to do, the error isn't very descriptive.



Trouble connecting to IRC

2017-06-29 Thread Anshum Gupta
Hi,

I’ve been having issues connecting to the freenode IRC server for about 45 min 
now. Any one else seeing something similar ?


-Anshum





Re: How to Apply 'implicit' routing in exist collection in solr 6.1.0

2017-04-04 Thread Anshum Gupta
Hi Ketan,

I just want to be sure about your understanding of the 'implicit' router.

Implicit router in Solr puts the onus of correctly routing the documents on
the user, instead of 'implicitly' or automatically routing them.

-Anshum

On Tue, Apr 4, 2017 at 2:01 AM Ketan Thanki  wrote:

>
> Hi,
>
> Need the help for how to apply 'implicit' routing in existing collections.
> e.g :  I have configure the 2 collections with each has 4 shard and 4
> replica so what changes should i
> do for apply ' implicit' routing.
>
> Please  do needful with some examples.
>
> Regards,
> Ketan.
>
> [CC Award Winners!]
>
>


Re: Solr Shard Splitting Issue

2017-01-30 Thread Anshum Gupta
I see a successful completion of the request in the logs here:

2017-01-18 14:43:55.439 INFO
(OverseerStateUpdate-97304349976428549-10.1.1.78:4983_solr-n_00)
[   ] o.a.s.c.o.SliceMutator Update shard state invoked for
collection: collection1 with message: {
  "shard1":"inactive",
  "collection":"collection1",
  "shard1_1":"active",
  "operation":"updateshardstate",
  "shard1_0":"active"}
2017-01-18 14:43:55.439 INFO
(OverseerStateUpdate-97304349976428549-10.1.1.78:4983_solr-n_00)
[   ] o.a.s.c.o.SliceMutator Update shard state shard1 to inactive
2017-01-18 14:43:55.439 INFO
(OverseerStateUpdate-97304349976428549-10.1.1.78:4983_solr-n_00)
[   ] o.a.s.c.o.SliceMutator Update shard state shard1_1 to active
2017-01-18 14:43:55.439 INFO
(OverseerStateUpdate-97304349976428549-10.1.1.78:4983_solr-n_00)
[   ] o.a.s.c.o.SliceMutator Update shard state shard1_0 to active


I think you might be looking at the admin UI to figure out the state
of the shards, and that might still be broken. Can you confirm the
state of the shard from the CLUSTERSTATUS API ?

Also, you shouldn't be invoking SPLITSHARD on the same shard multiple
times like you did when the non-async version failed.

-Anshum


On Thu, Jan 19, 2017 at 6:12 AM ekta  wrote:

> Hi Anshum,
>
> Thanks for the reply.
>
> I had the copy of data that i was experimenting on, and anyways i was doing
> it later too, after i posted the mail. Some points i want to let you know:-
>
> 1. This time i did not change the state of state.json.
> 2. Rest,I did the same steps as above and still the data got frozen to 24GB
> in both shards(my parent shard had -60GB).
> 3. Still, the state.json is showing
> 3.1 Parent -  Active
> 3.2 Child   - Construction
> 4.Yeah i do have logs , i am attaching the file with mail. Please check it
> out.
> 5. I did shard splitting by this command
>
> "
> http://10.1.1.78:4983/solr/admin/collections?action=SPLITSHARD=collection1=shard1
> "
> in browser, and i got Timeout Exception in browser. I am attaching the file
> which contains, what the browser displayed.
> 6. The Details of the system(Amazon EC2 Instances) for which i am doing
> above steps is:
>  6.1 30GB RAM
>  6.2 4 cores
>  6.3 250 GB drive
> 7. Lastly , i googled about the timeout exception that i got, i found some
> reply by you on the post about the same, where  you mentioned to issue the
> spilt shard command asynchronously , i tried with that too. As a result no
> doubt i did not got time out exception from browser but , rest all was same
> as mentioned above.
>
> Please tell me if any further details are required.  solr.log
> 
> Browser_result.txt
> 
>
>
>
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Shard-Splitting-Issue-tp4314145p4314813.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Advanced Document Routing Questions

2017-01-29 Thread Anshum Gupta
SolrCloud auto routes the documents to the correct shard leader, however
you would be able to reduce the extra hop by sending the document to the
correct shard. Here are a few posts that explain how the document routing
in SolrCloud works:

https://lucidworks.com/2013/06/13/solr-cloud-document-routing/
https://lucidworks.com/2014/01/06/multi-level-composite-id-routing-solrcloud/

If the extra hop isn't something you are much bothered about, I wouldn't
suggest adding the complexity to your client code.

The Java client that Solr ships with, SolrJ has a 'smart' client i.e.
CloudSolrClient, that tracks the clusterstate in zk by keeping a watch. It
also contains caching logic, and more to optimize sending requests to a
SolrCloud cluster. You might want to explore that and possibly use that
instead.

-Anshum


On Sun, Jan 29, 2017 at 9:49 AM GW  wrote:

> Hi folks,
>
> 1: Can someone point me to some good documentation on how this works? Or is
> it so simple that I'm over thinking?
>
> My understanding of document routing is that I might be able to check the
> hash of a shard with the hash of the document id and determine the exact
> node / document and know exactly where to send the request reducing
> Zookeeper traffic.
>
> I'm getting ready to deploy and I have used the recommended format in my
> doc id
>
> All my work is REST/curl -> Solrcloud
>
> I plan to watch cluster status through the admin console REST to and build
> a list of OK servers to do the reads for the website.
>
> I have a crawler that will be running mostly 3:am Eastern to 3:am Pacific,
> outside the bulk of read activity. I plan to do all posts to Who Has
> Zookeeper according the admin REST API
>
> Can I get some reassurance? Be gentle, this is my very first solrcloud
> deployment and it's going to production. I'm about to write script for
> something that I still feel I am week in concept.
>
> When I'm done and I totally understand, I promise to publish a nice A - Z
> REST deployment HowTo for HA with class examples in (PHP,Perl,Python)/curl.
>
>
> Best regards,
>
> GW
>


Re: Solr Shard Splitting Issue

2017-01-18 Thread Anshum Gupta
Hi Ekta,

Rule#1 - You shouldn't forcefully and manually change the state unless you
know what you're doing and have performed all the checks.

Seems like the child shards were still getting created i.e. copying the
entire index from the parent shard when you manually switched. One of the
reasons for this could be that you ran out of disk on the leader node. You
might be able to get more information about that by looking at the logs,
and information from any cluster management tool that you might be using
that tracks metrics like disk usage etc. The shard split, actually creates
2 subshards on the same node as the original parent, practically
duplicating the data in a separate set of index directories.

Did you send more updates while this was going on? You still might be able
to restore things from the original parent by changing the clusterstate to
how it was before you issues SPLITSHARD (with only the parent shard - in
active ). Before you do anything, I'd suggest you copy the indexes.

If you have any error logs, it would be good to share them here on the list
(if you can). Make sure you upload them to a file sharing service instead
of sending those as attachments to the mailing list.

-Anshum



On Mon, Jan 16, 2017 at 2:33 AM Ekta Bhalwara 
wrote:

> Hi ,
>
> I tried Shard Splitting with 6.3 version of Solr,with the following steps:-
>
> Step 1 :
>
> I have issued
> "collections?action=SPLITSHARD==shard1"
>
> Step 2 :
>
> I noticed 2 child shard got created shard1_0 and shard1_1
>
> step 3 :
>
>   After complete step 2, still I see
>
> shard1 state : active
>
> AND
>
> shard1_0 and shard1_1 :
>
> state:construction
>
> I checked the state in state.json for nearly 48 hours , but the data
> copying got frozen up while reaching a certain range(for example:- 60GB
> data in parent node, after splitting, both child nodes got 24GB data,
> then data copying into child not got stopped). The state.json file was
> not changing further.
>
> Moreover, when i manually changed state.json (parent node to inactive
> from active and child node to active from construction) i suffered a
> huge loss of data.Please look into the issue from your side and let me
> know in case of any further information is required.
>
>
> --
>
> Thanks & Regards
> Ekta
>
>


Re: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Anshum Gupta
Can you provide more information about:
- Are you using Solr in standalone or SolrCloud mode? What version of Solr?
- Why do you want this? Lack of disk space? Uneven distribution of data on
shards?
- Do you want this data together i.e. as part of a single collection?

You can check out the following APIs:
SPLITSHARD:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3
MIGRATE:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api12

Among other things, make sure you have enough spare disk-space before
trying out the SPLITSHARD API in particular.

-Anshum



On Mon, Jan 9, 2017 at 12:08 PM Mikhail Khludnev  wrote:

> Perhaps you can copy this index into a separate location. Remove odd and
> even docs into former and later indexes consequently, and then force merge
> to single segment in both locations separately.
> Perhaps shard splitting in SolrCloud does something like that.
>
> On Mon, Jan 9, 2017 at 1:12 PM, Narsimha Reddy CHALLA <
> chnredd...@gmail.com>
> wrote:
>
> > Hi All,
> >
> >   My solr server has a few large index files (say ~10G). I am looking
> > for some help on breaking them it into smaller ones (each < 4G) to
> satisfy
> > my application requirements. Are there any such tools available?
> >
> > Appreciate your help.
> >
> > Thanks
> > NRC
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: MLT Java example for Solr 6.3

2016-12-23 Thread Anshum Gupta
Hi Todd,

You can query for similar documents using the MLT Query Parser. The code
would look something like:

// Assuming you want to use CloudSolrClient
CloudSolrClient client = new CloudSolrClient.Builder()
.withZkHost(zkHost)
.build();
client.setDefaultCollection(COLLECTION_NAME);
QueryResponse queryResponse = client.query(new SolrQuery("{!mlt
qf=foo}docId"));

Notice the *docId*, *qf*, and the *!mlt* part.
docId - External document ID/unique ID of the document you want to query for
qf - fields that you want to use for similarity (you can read more about it
here:
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-MoreLikeThisQueryParser
)
!mlt - the query parser you want to use.


On Thu, Dec 22, 2016 at 3:01 PM  wrote:

> I am having trouble locating a decent example for using the MLT Java API
> in Solr 6.3. What I want is to retrieve document IDs that are similar to a
> given document ID.
>
> Todd Peterson
> Chief Embedded Systems Engineer
> Management Sciences, Inc.
> 6022 Constitution Ave NE
> Albuquerque, NM 87144
> 505-255-8611 <(505)%20255-8611> (office)
> 505-205-7057 <(505)%20205-7057> (cell)


Re: CREATEALIAS to non-existing collections

2016-12-09 Thread Anshum Gupta
I think that might have just been an oversight. We shouldn't allow creation
of an alias for non-existent collections.

On a similar note, I think we should also be clearing out the aliases when
we DELETE a collection.

-Anshum

On Fri, Dec 9, 2016 at 12:57 PM Tomás Fernández Löbbe 
wrote:

> We currently support requests to CREATEALIAS to collections that don’t
> exist. Requests to this alias later result in 404s. If the target
> collection is later created, requests to the alias will begin to work. I’m
> wondering if someone is relying on this behavior, or if we should validate
> the existence of the target collections when creating the alias (and thus,
> fail fast in cases of typos or unexpected cluster state)
>
> Tomás
>


Re: Hackday next month

2016-09-22 Thread Anshum Gupta
Sure, seems like Tuesday works best :) I'll try and make it too.

On Thu, Sep 22, 2016 at 10:02 AM Charlie Hull <char...@flax.co.uk> wrote:

> On 21/09/2016 19:28, Trey Grainger wrote:
> > I know a bunch of folks who would be likely attend the hackday (including
> > committers) will have some other meetings on Wednesday before the
> > conference, so I think that Tuesday is actually a pretty good time to
> have
> > this.
>
> Wednesday is also Yom Kippur - we weren't sure how many people this
> might affect but figured it would be best to avoid it for the Hackday.
> In any case, the venue is all arranged now and people have signed up, so
> Tuesday it is. There will also be beer & pizza that evening!
>
> Cheers
>
> Charlie
> >
> > My 2 cents,
> >
> > Trey Grainger
> > SVP of Engineering @ Lucidworks
> > Co-author, Solr in Action
> >
> > On Wed, Sep 21, 2016 at 1:20 PM, Anshum Gupta <ans...@anshumgupta.net>
> > wrote:
> >
> >> This is good but is there a way to instead do this on Wednesday?
> >> Considering that the conference starts on Thursday, perhaps it makes
> sense
> >>  to do it just a day before ? Not sure about others but it certainly
> would
> >> work much better for me.
> >>
> >> -Anshum
> >>
> >> On Wed, Sep 21, 2016 at 2:18 PM Charlie Hull <char...@flax.co.uk>
> wrote:
> >>
> >>> Hi all,
> >>>
> >>> If you're coming to Lucene Revolution next month in Boston, we're
> >>> running a Lucene-focused hackday (Lucene, Solr, Elasticsearch)
> >>> kindly hosted by BA Insight. There will be Lucene committers there,
> it's
> >>> free to attend and we also need ideas on what to do! Come and join us.
> >>>
> >>> http://www.meetup.com/New-England-Search-Technologies-
> >> NEST-Group/events/233492535/
> >>>
> >>> Cheers
> >>>
> >>> Charlie
> >>>
> >>> --
> >>> Charlie Hull
> >>> Flax - Open Source Enterprise Search
> >>>
> >>> tel/fax: +44 (0)8700 118334
> >>> mobile:  +44 (0)7767 825828
> >>> web: www.flax.co.uk
> >>>
> >>
> >
>
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk
>


Re: Hackday next month

2016-09-21 Thread Anshum Gupta
This is good but is there a way to instead do this on Wednesday?
Considering that the conference starts on Thursday, perhaps it makes sense
 to do it just a day before ? Not sure about others but it certainly would
work much better for me.

-Anshum

On Wed, Sep 21, 2016 at 2:18 PM Charlie Hull  wrote:

> Hi all,
>
> If you're coming to Lucene Revolution next month in Boston, we're
> running a Lucene-focused hackday (Lucene, Solr, Elasticsearch)
> kindly hosted by BA Insight. There will be Lucene committers there, it's
> free to attend and we also need ideas on what to do! Come and join us.
>
> http://www.meetup.com/New-England-Search-Technologies-NEST-Group/events/233492535/
>
> Cheers
>
> Charlie
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk
>


Re: Solr Collection Create API queries

2016-09-09 Thread Anshum Gupta
If you want to build a monitoring tool that maintains a replication factor,
I would suggest you use the Collections APIs (ClusterStatus, AddReplica,
DeleteReplica, etc.) and manage this from outside of Solr. I don't want to
pull you back from trying to build something but I think you'd be biting a
lot for the first bite if you take this up as the first thing to implement
within Solr.


On Fri, Sep 9, 2016 at 1:41 PM Swathi Singamsetty <
swathisingamsett...@gmail.com> wrote:

> I am experimenting on this functionality and see how the overseer monitors
> and keeps the minimum no of replicas up and running.
>
>
> In heavy indexing/search flow , if any replica goes down we need to keep
> the minimum no. of replicas up and running to serve the traffic and
> mainitain the availability of the cluster.
>
>
> Please let me know if you need more information.
>
> Can you point me to the git repo branch where I can dig deeper and see this
> functionality ?
>
>
>
> Thanks,
> Swathi.
>
>
>
>
>
> On Fri, Sep 9, 2016 at 1:10 PM, Anshum Gupta <ans...@anshumgupta.net>
> wrote:
>
> > Just to clarify here, I said that I really think it's an XY problem
> here. I
> > still don't know what is being attempted/built.
> >
> > From the last email, sounds like you want to build/support auto-addition
> of
> > replica but I would wait until you clarify the use case to suggest
> > anything.
> >
> > -Anshum
> >
> > On Fri, Sep 9, 2016 at 8:20 AM Erick Erickson <erickerick...@gmail.com>
> > wrote:
> >
> > > I think you're missing my point. The _feature_ may be there,
> > > you'll have to investigate. But it is not named "smartCloud" or
> > >  "autoManageCluster". Those terms
> > > 1> do not appear in the final patch.
> > > 2> do not appear in any file in Solr 6x.
> > >
> > > They were suggested names, what the final implementation
> > > used should be in the ref guide, although I admit this latter
> > > sometimes lags.
> > >
> > > Best,
> > > Erick
> > >
> > > On Fri, Sep 9, 2016 at 7:51 AM, Swathi Singamsetty
> > > <swathisingamsett...@gmail.com> wrote:
> > > > I am working on solr 6.0.0 to implement this feature.
> > > > I had a chat with Anshum and confirmed that this feature is available
> > in
> > > > 6.0.0 version.
> > > >
> > > >
> > > > The functionality is that to allow the overseer to bring up
> > > >  the minimum no. of replicas for each shard as per the
> > replicationFactor
> > > > set.
> > > >
> > > > I will look into the ref guide as well.
> > > >
> > > > Thanks,
> > > > Swathi.
> > > >
> > > > On Friday, September 9, 2016, Erick Erickson <
> erickerick...@gmail.com>
> > > > wrote:
> > > >
> > > >> You cannot just pick arbitrary parts of a JIRA discussion
> > > >> and expect them to work. JIRAs are places where
> > > >> discussion of alternatives takes place and the discussion
> > > >> often suggests ideas that are not incorporated
> > > >> in the final patch. The patch for the JIRA you mentioned,
> > > >> for instance, does not mention either of those parameters,
> > > >> which implies that they were simply part of the discussion
> > > >> and were never implemented.
> > > >>
> > > >> So this sounds like an "XY" problem. You're asking why
> > > >> properties aren't persisted when you really want to take
> > > >> advantage of some functionality. What is that functionality?
> > > >>
> > > >> BTW, I'd go by the ref guide rather than JIRAs unless you
> > > >> examine the patch and see that the discussion was
> > > >> implemented in the patch.
> > > >>
> > > >> Best,
> > > >> Erick
> > > >>
> > > >> On Thu, Sep 8, 2016 at 9:33 PM, Swathi Singamsetty
> > > >> <swathisingamsett...@gmail.com <javascript:;>> wrote:
> > > >> > Hi Team,
> > > >> >
> > > >> > To implement the feature "Persist and use the
> > > >> > replicationFactor,maxShardsPerNode at Collection level" am
> > > >> following
> > > >> > the steps mentioned in the jira ticket
> > > >> > https://issues.apache.org/jira/browse/SOLR-4808.
> > > >> >
> > > >> > I used the "smartCloud" and "autoManageCluster" properties to
> > create a
> > > >> > collection in the create collection API to allow the overseer to
> > > bring up
> > > >> > the minimum no. of replicas for each shard as per the
> > > replicationFactor
> > > >> set
> > > >> > . But these 2 properties did not persist in the cluster state.
> Could
> > > >> > someone let me know how to use these properties in this feature?
> > > >> >
> > > >> >
> > > >> >
> > > >> > Thanks & Regards,
> > > >> > Swathi.
> > > >>
> > >
> >
>


Re: Solr Collection Create API queries

2016-09-09 Thread Anshum Gupta
Just to clarify here, I said that I really think it's an XY problem here. I
still don't know what is being attempted/built.

>From the last email, sounds like you want to build/support auto-addition of
replica but I would wait until you clarify the use case to suggest anything.

-Anshum

On Fri, Sep 9, 2016 at 8:20 AM Erick Erickson 
wrote:

> I think you're missing my point. The _feature_ may be there,
> you'll have to investigate. But it is not named "smartCloud" or
>  "autoManageCluster". Those terms
> 1> do not appear in the final patch.
> 2> do not appear in any file in Solr 6x.
>
> They were suggested names, what the final implementation
> used should be in the ref guide, although I admit this latter
> sometimes lags.
>
> Best,
> Erick
>
> On Fri, Sep 9, 2016 at 7:51 AM, Swathi Singamsetty
>  wrote:
> > I am working on solr 6.0.0 to implement this feature.
> > I had a chat with Anshum and confirmed that this feature is available in
> > 6.0.0 version.
> >
> >
> > The functionality is that to allow the overseer to bring up
> >  the minimum no. of replicas for each shard as per the replicationFactor
> > set.
> >
> > I will look into the ref guide as well.
> >
> > Thanks,
> > Swathi.
> >
> > On Friday, September 9, 2016, Erick Erickson 
> > wrote:
> >
> >> You cannot just pick arbitrary parts of a JIRA discussion
> >> and expect them to work. JIRAs are places where
> >> discussion of alternatives takes place and the discussion
> >> often suggests ideas that are not incorporated
> >> in the final patch. The patch for the JIRA you mentioned,
> >> for instance, does not mention either of those parameters,
> >> which implies that they were simply part of the discussion
> >> and were never implemented.
> >>
> >> So this sounds like an "XY" problem. You're asking why
> >> properties aren't persisted when you really want to take
> >> advantage of some functionality. What is that functionality?
> >>
> >> BTW, I'd go by the ref guide rather than JIRAs unless you
> >> examine the patch and see that the discussion was
> >> implemented in the patch.
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, Sep 8, 2016 at 9:33 PM, Swathi Singamsetty
> >> > wrote:
> >> > Hi Team,
> >> >
> >> > To implement the feature "Persist and use the
> >> > replicationFactor,maxShardsPerNode at Collection level" am
> >> following
> >> > the steps mentioned in the jira ticket
> >> > https://issues.apache.org/jira/browse/SOLR-4808.
> >> >
> >> > I used the "smartCloud" and "autoManageCluster" properties to create a
> >> > collection in the create collection API to allow the overseer to
> bring up
> >> > the minimum no. of replicas for each shard as per the
> replicationFactor
> >> set
> >> > . But these 2 properties did not persist in the cluster state. Could
> >> > someone let me know how to use these properties in this feature?
> >> >
> >> >
> >> >
> >> > Thanks & Regards,
> >> > Swathi.
> >>
>


[ANNOUNCE] Apache Solr 5.5.3 released

2016-09-09 Thread Anshum Gupta
09 September 2016, Apache Solr™ 5.5.3 available

The Lucene PMC is pleased to announce the release of Apache Solr 5.5.3

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search. Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the search
and navigation features of many of the world's largest internet sites.

This release includes 5 bug fixes since the 5.5.2 release.

The release is available for immediate download at:

  http://www.apache.org/dyn/closer.lua/lucene/solr/5.5.3

This release specially contains 2 critical fixes:
* The number of TCP connections in CLOSE_WAIT state do not spike during
indexing,
* PeerSync no longer fails on a node restart due to IndexFingerPrint
mismatch.

Please read CHANGES.txt for a detailed list of changes:

  https://lucene.apache.org/solr/5_5_3/changes/Changes.html

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring
network for distributing releases. It is possible that the mirror you
are using may not have replicated the release yet. If that is the
case, please try another mirror. This also goes for Maven access.

-Anshum Gupta


Re: /select results different between 5.4 and 6.1

2016-08-19 Thread Anshum Gupta
The default similarity changed from TF-IDF to BM25 in 6.0.

On Fri, Aug 19, 2016 at 3:00 PM John Bickerstaff 
wrote:

> Bump!
>
> TL;DR Question: Are scores (and debug output) *expected* to be different
> between 5.4 and 6.1?
>
> On Thu, Aug 18, 2016 at 2:44 PM, John Bickerstaff <
> j...@johnbickerstaff.com>
> wrote:
>
> > Hi all,
> >
> > TL:DR -
> > Is it expected that the /select endpoint would produce different
> > scores/result order between versions 5.4 and 6.1?
> >
> >
> > (I'm aware that it's certainly possible I've done something different to
> > these environments, although at this point I can't see any difference in
> > configs etc... and I used a very simple search against /select to test
> this)
> >
> > == Detail ==
> >
> > I'm currently seeing different scoring and different result order when I
> > compare Solr results in the Admin console for a 5.4 and 6.1 environment.
> >
> > I'm using the /select endpoint to try to avoid any difference in
> > configuration.  To the best of my knowledge (and reading) I haven't ever
> > modified the xml for that endpoint.
> >
> > As I was looking into it, I saw that the debug output looks quite
> > different in 6.1...
> >
> > Any advice, including "You must have broken it yourself, that's
> > impossible" is much appreciated.
> >
> >
> >
> > Here's debug from the "old" 5.4 SolrCloud environment.  The id's are a
> > pain to read, but not only am I getting different scores, I'm getting
> > different docs (or docs in a clearly different order)
> >
> > "debug": { "rawquerystring": "chiari", "querystring": "chiari", "
> > parsedquery": "text:chiari", "parsedquery_toString": "text:chiari", "
> > explain": { "d9644f86-5fe2-4a9f-8517-545e2cde0b64": "\n4.3581347 =
> > weight(text:chiari in 26783) [ClassicSimilarity], result of:\n 4.3581347
> =
> > fieldWeight in 26783, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> 1.0
> > = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n 0.625 =
> > fieldNorm(doc=26783)\n", "1347f707-6fdd-4864-b9dd-6d3e7cc32bf5":
> "\n4.3581347
> > = weight(text:chiari in 26792) [ClassicSimilarity], result of:\n
> 4.3581347
> > = fieldWeight in 26792, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n
> 0.625 =
> > fieldNorm(doc=26792)\n", "d01c32ad-e29d-4b65-9930-f8a6844a2613":
> "\n4.3581347
> > = weight(text:chiari in 27028) [ClassicSimilarity], result of:\n
> 4.3581347
> > = fieldWeight in 27028, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n
> 0.625 =
> > fieldNorm(doc=27028)\n", "0c5a4be7-1162-4b1a-ab83-4b48a690fc3a":
> "\n4.3581347
> > = weight(text:chiari in 27029) [ClassicSimilarity], result of:\n
> 4.3581347
> > = fieldWeight in 27029, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n
> 0.625 =
> > fieldNorm(doc=27029)\n", "e1cb441d-9d60-482d-956b-3fbc964a17c1":
> "\n4.3581347
> > = weight(text:chiari in 27042) [ClassicSimilarity], result of:\n
> 4.3581347
> > = fieldWeight in 27042, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n
> 0.625 =
> > fieldNorm(doc=27042)\n", "f87951f1-e163-4f17-a628-904b9df0c609":
> "\n4.3581347
> > = weight(text:chiari in 27043) [ClassicSimilarity], result of:\n
> 4.3581347
> > = fieldWeight in 27043, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n
> 0.625 =
> > fieldNorm(doc=27043)\n", "caaa7ca1-34cb-44a8-8dd9-12c909db8c2d":
> "\n4.3581347
> > = weight(text:chiari in 27044) [ClassicSimilarity], result of:\n
> 4.3581347
> > = fieldWeight in 27044, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n
> 0.625 =
> > fieldNorm(doc=27044)\n", "ada7a87e-725a-4533-b72e-3817af4c7179":
> "\n4.3581347
> > = weight(text:chiari in 27055) [ClassicSimilarity], result of:\n
> 4.3581347
> > = fieldWeight in 27055, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n
> 0.625 =
> > fieldNorm(doc=27055)\n", "ac6d47fd-9a59-47d6-8cfb-11b34c7ded54":
> "\n4.3581347
> > = weight(text:chiari in 27056) [ClassicSimilarity], result of:\n
> 4.3581347
> > = fieldWeight in 27056, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n
> 0.625 =
> > fieldNorm(doc=27056)\n", "4aaa7697-b26a-4bea-ba4e-70d18ea649f0":
> "\n4.3581347
> > = weight(text:chiari in 62240) [ClassicSimilarity], result of:\n
> 4.3581347
> > = fieldWeight in 62240, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n
> 0.625 =
> > fieldNorm(doc=62240)\n" }, "QParser": "LuceneQParser", 

Re: Creating a SolrJ Data Service to send JSON to Solr

2016-08-16 Thread Anshum Gupta
I would also suggest sending the JSON directly to the JSON end point, with
the mapping :
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-JSONUpdateConveniencePaths

On Tue, Aug 16, 2016 at 4:43 PM Alexandre Rafalovitch 
wrote:

> Why do you need a POJO? For Solr purposes, you could just get the
> field names from schema and use those to map directly from JSON to the
> 'addField' calls in SolrDocument.
>
> Do you need it for non-Solr purposes? Then you can search for generic
> Java dynamic POJO generation solution.
>
> Also, you could look at creating a superset rather than common-subset
> POJO and then ignore all unknown fields on Solr side by adding a
> dynamicField that matches '*' with everything (index, store,
> docValues) set to false.
>
> Regards,
>Alex.
>
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 17 August 2016 at 02:49, Jennifer Coston
>  wrote:
> >
> > Hello,
> > I am trying to write a data service using SolrJ that will allow me to
> > accept JSON through a REST API, create a Solr document ,and write it to
> > multiple different Solr cores (depending on the core name specified). The
> > problem I am running into is that each core is going to have a different
> > schema. My current code has the common fields between all the schemas in
> a
> > data POJO which I then walk and set the values specified in the JSON to
> the
> > Solr Document. However, I don’t want to create a different class for each
> > schema to process the JSON and convert it to a Solr Document. Is there a
> > way to process the extra JSON fields that are not common between the
> > schemas and add them to the Solr Document, without knowing what they are
> > ahead of time? Is there a way to convert JSON to a Solr Document without
> > having to use a POJO?  An alternative I was looking into is to use the
> > SolrClient to get the schema fields, create a POJO, walk that POJO to
> > create a Solr Document and then add it to Solr but, it doesn’t seem to be
> > possible to obtain the fields this way.
> >
> > I know that the easiest way to add JSON to Solr would be to use a curl
> > command and send the JSON directly to Solr but this doesn’t match our
> > requirements, so I need to figure out a way to perform the same operation
> > using SolrJ. Any other ideas or suggestions would be greatly appreciated!
> >
> > Thank you,
> >
> > -Jennifer
>


Re: Create collection on all nodes using the Collection API

2016-08-11 Thread Anshum Gupta
Hi Alexandre,

You can you the CLUSTERSTATUS Collections API (
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api18)
to get a list of live nodes.

-Anshum

On Thu, Aug 11, 2016 at 10:16 AM Alexandre Drouin <
alexandre.dro...@orckestra.com> wrote:

> Hi,
>
> What would be the best/easiest way to create a collection (only one shard)
> using the Collection API and have a replica created on all live nodes?
>
> Using the 'create collection' API, I can use the 'replicationFactor'
> parameter and specify the number of replica I want for my collection.  So
> if I have 3 lives nodes I can say 'replicationFactor=3' and my collection
> will have a replica on all lives nodes.  However I do not want to
> 'hardcode' my number of live nodes for obvious reasons, so because of that
> I have the following questions:
>
> 1) Is there a way to create a collection (only one shard) and having a
> replica of the shard on all live nodes?
>
> 2) Assuming #1 is not possible, is it possible to have the list of live
> nodes ?  If I can have the list of live nodes I could detect the number
> required for the replicationFactor parameter.
>
> Thanks
>
> Alexandre Drouin
>


Re: Is it possible to force a Shard Leader change?

2016-07-28 Thread Anshum Gupta
I understand there could be many reasons, but if it is at all possible, I'd
suggest you upgrade to a more recent version of Solr.

With that, you'd get a ton of bug fixes, and also a bunch of APIs that
would help you with triggering a leader election.

On Tue, Jul 26, 2016 at 9:27 PM, Tim Chen <tim.c...@sbs.com.au> wrote:

> Hi Guys,
>
> I am running a Solr Cloud 4.10, with 4 Solr servers and 5 Zookeeper setup.
>
> Solr servers:
> solr01, solr02, solr03, solr04
>
> I have around 20 collections in Solr cloud, and there are 4 Shards for
> each Collection. For each Shard, I have 4 Replicas, and sitting on each
> Solr server, with one of them is the Shard Leader.
>
> The issue I am having right now is all the Shard Leader are pointing to
> the same server, eg: solr01.  When there are documents update, they are all
> pushed to the Leader. I really want to distribute the Shard Leader across
> all 4 Solr servers.
>
> I noticed Solr 6 has a "REBALANCELEADERS" command to do that, but not
> available in Solr 4.
>
> Questions:
>
> 1, Is my setup OK? with 4 Shards for each Collection and 4 Replicas for
> each Shard. Each Solr server has full set of documents.
> 2, To distribute the Shard Leader to different Solr servers, can I somehow
> shutdown a single Replica that is currently a Shard Leader and force Solr
> to elect a different replica to be new Shard Leader?
>
> Thanks guys!
>
> Regards,
> Tim
>
>
> [Roots Wednesday 27 July 8.30pm]<http://www.sbs.com.au/programs/roots/>
>



-- 
Anshum Gupta


Re: Schema Changes

2016-07-28 Thread Anshum Gupta
Hi Ethan,

If the new fields are something that the old documents are also supposed to
contain, you would need to reindex. e.g. in case you add a new copy field
or a new field in general that your raw document contains, you would need
to reindex.
If the new field would only be something that exists in future documents,
you wouldn't need to reindex.

-Anshum

On Thu, Jul 28, 2016 at 12:50 PM, Ethan <eh198...@gmail.com> wrote:

> Hi,
>
> We change our schema to add new fields 3-4 times a year.  Never modify
> existing fields.
>
> Some of my colleagues say it requires re-indexing. Does it?  None of the
> existing field has changed.  schema.xml is the only file that s modified.
> So what's the point in re-indexing?
>
> Appreciate any insight.
>
> Thanks
>



-- 
Anshum Gupta


Re: File Descriptor/Memory Leak

2016-07-07 Thread Anshum Gupta
I've created a JIRA to track this:
https://issues.apache.org/jira/browse/SOLR-9290

On Thu, Jul 7, 2016 at 8:00 AM, Shai Erera <ser...@gmail.com> wrote:

> Shalin, we're seeing that issue too (and actually actively debugging it
> these days). So far I can confirm the following (on a 2-node cluster):
>
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
>
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
>
> Reviewing the changes from 5.4.1 to 5.5.1 I tried reverting some that
> looked suspicious (SOLR-8451 and SOLR-8578), even though the changes look
> legit. That did not help, and honestly I've done that before we suspected
> it might be the SSL. Therefore I think those are "safe", but just FYI.
>
> When it does happen, the number of CLOSE_WAITS climb very high, to the
> order of 30K+ entries in 'netstat'.
>
> When I say it does not reproduce on 5.4.1 I really mean the numbers don't
> go as high as they do in 5.5.1. Meaning, when running without SSL, the
> number of CLOSE_WAITs is smallish, usually less than a 10 (I would
> separately like to understand why we have any in that state at all). When
> running with SSL and 5.4.1, they stay low at the order of hundreds the
> most.
>
> Unfortunately running without SSL is not an option for us. We will likely
> roll back to 5.4.1, even if the problem exists there, but to a lesser
> degree.
>
> I will post back here when/if we have more info about this.
>
> Shai
>
> On Thu, Jul 7, 2016 at 5:32 PM Shalin Shekhar Mangar <
> shalinman...@gmail.com>
> wrote:
>
> > I have myself seen this CLOSE_WAIT issue at a customer. I am running some
> > tests with different versions trying to pinpoint the cause of this leak.
> > Once I have some more information and a reproducible test, I'll open a
> jira
> > issue. I'll keep you posted.
> >
> > On Thu, Jul 7, 2016 at 5:13 PM, Mads Tomasgård Bjørgan <m...@dips.no>
> > wrote:
> >
> > > Hello there,
> > > Our SolrCloud is experiencing a FD leak while running with SSL. This is
> > > occurring on the one machine that our program is sending data too. We
> > have
> > > a total of three servers running as an ensemble.
> > >
> > > While running without SSL does the FD Count remain quite constant at
> > > around 180 while indexing. Performing a garbage collection also clears
> > > almost the entire JVM-memory.
> > >
> > > However - when indexing with SSL does the FDC grow polynomial. The
> count
> > > increases with a few hundred every five seconds or so, but reaches
> easily
> > > 50 000 within three to four minutes. Performing a GC swipes most of the
> > > memory on the two machines our program isn't transmitting the data
> > directly
> > > to. The last machine is unaffected by the GC, and both memory nor FDC
> > > doesn't reset before Solr is restarted on that machine.
> > >
> > > Performing a netstat reveals that the FDC mostly consists of
> > > TCP-connections in the state of "CLOSE_WAIT".
> > >
> > >
> > >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>



-- 
Anshum Gupta


Re: Shard vs Replica

2016-07-06 Thread Anshum Gupta
A collection in SolrCloud is a logical entity that encapsulates documents
that confirm to a shared schema. As a distributed system, the data needs to
be split and so the collection is logically split into 'Shards'.
Shard(s):
 * don't represent a physical index.
 * are logical entities

Replica:
 * is physical manifestation of a shard
 * is an actual lucene index
 * therefore, can independently serve requests and accept document updates
 * Unlike the dictionary meaning, it is not a 'replica' of anything but is
just a physical manifestation (I'm repeating this, I know)

Moving on, for each shard, there are a few things that need a single
controlling point e.g. versioning the incoming documents and maintaining
optimistic concurrency. One of the replicas for each shard is given those
responsibilities and is called the 'leader'.
The leader changes via leader election. I'm not going to go into the
details of leader election and when it happens here.

All other non-leader replicas (we at times refer to them as followers)
receive updates from the leader, who versions the documents.

To sum it up, if you are a Java developer, in terms of analogy,
collections, and shards are classes but replicas are objects.

Imagine a 'wikipedia' collection. It may have 10 shards that split all of
wikipedia into 10 parts for the sake of manageability.
Depending upon our traffic, we may choose the number of replicas (called
replication factor) for each shard.

*NOTE*: a replication factor of 1 means, there is 1 replica for each shard
i.e. there is ONE physical index for each shard definition. In such a case,
this replica would also be the leader.

If the replication factor was 2, there would be 2 physical index copies of
each shard and one of the 2 would be assigned the role of a leader.

Hope this helps.


On Wed, Jul 6, 2016 at 2:32 PM, John Doe <mailinglists...@gmail.com> wrote:

> Hey,
>
> I have have the same question on freenode channel , people answered me ,
> but I believe that I still got doubts. Just because I never had approach to
> such data store technologies before it makes me hardly understand what is
> exactly is replica and shard in solr. I believe once I understand what
> exactly are these two, then I would be able to see the difference.
>
> According to English dictionary replica is exact copy of something, which
> sounds like a true to me, but what is shard then here and how is it
> connected with all this context ? Can someone explain this in brief and
> give some examples ?
>
> Thank you in advance
>



-- 
Anshum Gupta


Re: Disable leaders in SolrCloud mode

2016-05-16 Thread Anshum Gupta
I think you are approaching the problem all wrong. This seems, what is
described as an x-y problem (https://people.apache.org/~hossman/#xyproblem).
Can you tell us more about :
* What's your setup like? SolrCloud - Version, number of shards, is there
any custom code, etc.
* Did you start seeing this more recently? If so, what did you change?

To already answer your question, there is no way in SolrCloud to disable or
remove the concept of 'leaders'. However, there would be other ways to fix
your setup, and get rid of the issues you are facing once you share more
details.


On Mon, May 16, 2016 at 12:33 PM, Li Ding <li.d...@bloomreach.com> wrote:

> Hi all,
>
> We have an unique scenario where we don't need leaders in every collection
> to recover from failures.  The indexing never changes.  But we have faced
> problems where either zk marked a core as down while the core is fine in
> non-distributed query or during restart, the core never comes up.  My
> question is that is there any simple way to disable those leaders and
> leaders election in SolrCloud,  We do use multi-shard and distributed
> queries.  But with our unique situation, we don't need leaders to maintain
> the correct status of the index.  So if we can get rid of that part, our
> solr restart will be more robust.
>
> Any suggestions will be appreciated.
>
> Thanks,
>
> Li
>



-- 
Anshum Gupta


Re: ConcurrentUpdateSolrClient Invalid version (expected 2, but 60) or the data in not in 'javabin' format

2016-04-25 Thread Anshum Gupta
Hi Joe,

Can you confirm if the version of Solr and SolrJ are in sync ?

On Mon, Apr 25, 2016 at 10:05 AM, Joe Lawson <
jlaw...@opensourceconnections.com> wrote:

> This appear to be a bug that'll be fixed in 6.1:
> https://issues.apache.org/jira/browse/SOLR-7729
>
> On Fri, Apr 22, 2016 at 8:07 PM, Doug Turnbull <
> dturnb...@opensourceconnections.com> wrote:
>
> > Joe this might be _version_ as in Solr's optimistic concurrency used in
> > atomic updates, etc
> >
> > http://yonik.com/solr/optimistic-concurrency/
> >
> > On Fri, Apr 22, 2016 at 5:24 PM Joe Lawson <
> > jlaw...@opensourceconnections.com> wrote:
> >
> > > I'm updating from a basic Solr Client to the ConcurrentUpdateSolrClient
> > and
> > > I'm hitting a really strange error. I cannot share the code but the
> > snippet
> > > is like:
> > >
> > > try (ConcurrentUpdateSolrClient solrUpdateClient =
> > > >  new ConcurrentUpdateSolrClient("
> > > > http://localhost:8983/solr;, 1000, 1)) {
> > > > String _core = "lots";
> > > > List batch = docs.subList(batch_start,
> > > > batch_end);
> > > > response = solrUpdateClient.add(_core,batch);
> > > > solrUpdateClient.commit(_core);
> > > > ...
> > > > }
> > >
> > >
> > >
> > > Once the commit is called I get the following error:
> > >
> > > 17:17:22.585 [concurrentUpdateScheduler-1-thread-1-processing-http://
> > > //localhost:8983//solr]
> > > >> WARN  o.a.s.c.s.i.ConcurrentUpdateSolrClient - Failed to parse error
> > > >> response from http://localhost:8983/solr due to:
> > > >> java.lang.RuntimeException: Invalid version (expected 2, but 60) or
> > the
> > > >> data in not in 'javabin' format
> > > >
> > > > 17:17:22.588 [concurrentUpdateScheduler-1-thread-1-processing-http://
> > > //localhost:8983//solr]
> > > >> ERROR o.a.s.c.s.i.ConcurrentUpdateSolrClient - error
> > > >
> > > > org.apache.solr.common.SolrException: Not Found
> > > >
> > > >
> > > >>
> > > >>
> > > >> request: http://localhost:8983/solr/update?wt=javabin=2
> > > >
> > > > at
> > > >>
> > >
> >
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:290)
> > > >> [solr-solrj-6.0.0.jar:6.0.0
> 48c80f91b8e5cd9b3a9b48e6184bd53e7619e7e3 -
> > > >> nknize - 2016-04-01 14:41:50]
> > > >
> > > > at
> > > >>
> > >
> >
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:161)
> > > >> [solr-solrj-6.0.0.jar:6.0.0
> 48c80f91b8e5cd9b3a9b48e6184bd53e7619e7e3 -
> > > >> nknize - 2016-04-01 14:41:50]
> > > >
> > > > at
> > > >>
> > >
> >
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> > > >> [solr-solrj-6.0.0.jar:6.0.0
> 48c80f91b8e5cd9b3a9b48e6184bd53e7619e7e3 -
> > > >> nknize - 2016-04-01 14:41:50]
> > > >
> > > > at
> > > >>
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> > > >> ~[na:1.8.0_92]
> > > >
> > > > at
> > > >>
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> > > >> ~[na:1.8.0_92]
> > > >
> > > > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_92]
> > > >
> > > >
> > > Any help suggestions is appreciated.
> > >
> > > Cheers,
> > >
> > > Joe Lawson
> > >
> >
>



-- 
Anshum Gupta


Re: Anticipated Solr 5.5.1 release date

2016-04-15 Thread Anshum Gupta
Hi Tom,

I plan on getting a release candidate out for vote by Monday. If all goes
well, it'd be about a week from then for the official release.

On Fri, Apr 15, 2016 at 6:52 AM, Tom Evans <tevans...@googlemail.com> wrote:

> Hi all
>
> We're currently using Solr 5.5.0 and converting our regular old style
> facets into JSON facets, and are running in to SOLR-8155 and
> SOLR-8835. I can see these have already been back-ported to 5.5.x
> branch, does anyone know when 5.5.1 may be released?
>
> We don't particularly want to move to Solr 6, as we have only just
> finished validating 5.5.0 with our original queries!
>
> Cheers
>
> Tom
>



-- 
Anshum Gupta


Re: Release date for Solr 6.0

2016-04-07 Thread Anshum Gupta
Hi Ben,

The vote for 6.0 just passed about an hour ago and it should just a matter
of at the most 3-4 days depending on the availability of the release
manager before the artifacts are published to all mirrors and the official
note is sent out.

On Thu, Apr 7, 2016 at 8:48 AM, Ben Earley <bdearle...@gmail.com> wrote:

> Hi there,
>
> My team has been using Solr 4 on a large distributed system and we are
> interested in upgrading to Solr 6 when the new version is released to
> leverage some of the new features, such as graph queries.  Is anyone able
> to provide any insight as to the release schedule for this new version?
>
> Thanks,
>
> Ben Earley
>



-- 
Anshum Gupta


Re: Adding configset in SolrCloud via API

2016-04-06 Thread Anshum Gupta
As of now, there's no way to do so. There were some efforts on those lines but 
it's been on hold.

-Anshum

> On Apr 6, 2016, at 12:21 PM, Don Bosco Durai  wrote:
> 
> Is there an equivalent of server/scripts/cloud-scripts/zkcli.sh  -zkhost 
> $zk_host -cmd upconfig -confdir $config_folder -confname $config_name using 
> APIs?
> 
> I want to bootstrap by uploading the configs via API. Once the configs are 
> uploaded, I am now able to do everything else via API.
> 
> Thanks
> 
> Bosco
> 
> 


Re: Solr 5.5 Security feature is not working on it.

2016-04-05 Thread Anshum Gupta
Hi Vijay,

Can you provide more information about what you were trying to do and why
do you think this isn't working? The more details you can provide, the
better.
* What's your SolrCloud setup
* How did you enable security
* What do you expect ?
* What do you see ?


On Tue, Apr 5, 2016 at 1:02 PM, Vijayakumar Ramdoss <nellaivi...@live.com>
wrote:

> Hi All,We are recently start leveraging the Solr 5.5 version in the Cloud
> mode. Even enabling the security in the SolrCloud. Its not working looking
> your advice to debug the issue.
>
> cat security.json{"authentication":{   "class":"solr.BasicAuthPlugin",
>  "blockUnknown": true,
>  "credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0=
> Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}},"authorization":{
>  "class":"solr.RuleBasedAuthorizationPlugin",
>  "permissions":[{"name":"security-edit",  "role":"admin"}]
>  "user-role":{"solr":"admin"}}}
>
> ThanksVijay




-- 
Anshum Gupta


Re: Parallel Updates

2016-04-04 Thread Anshum Gupta
Solr would push all updates to all shards that are supposed to host the
data. The documents are initially forwarded to the leader of the shard,
which can dynamically change and the leader is responsible for versioning
and ensuring replication across the followers but other than that, all
nodes would be equally loaded in most regular situations.

On Mon, Apr 4, 2016 at 3:37 PM, John Bickerstaff <j...@johnbickerstaff.com>
wrote:

> Does SOLR cloud push indexing across all nodes?  I've been planning 4 SOLR
> boxes with only 3 exposed via the load balancer, leaving the 4th available
> internally for my microservices to hit with indexing work.
>
> I was assuming that if I hit my "solr4" IP address, only "solr4" will do
> the indexing...  Perhaps I'm making a dangerous assumption?
> On Apr 4, 2016 3:49 PM, "Anshum Gupta" <ans...@anshumgupta.net> wrote:
>
> The short answer is - There's no real limit on Solr in terms of
> concurrency.
>
> Here are a few things that would impact your numbers though:
> * What version of Solr are you using and how ? i.e. SolrCloud, standalone,
> traditional replication ?
> * Do you use atomic updates?
> * How do you index ?
>
> Assuming you are on SolrCloud, you wouldn't be able to have a dedicated
> indexing node.
>
> There are a ton of other settings you could read about and tweak to get
> good throughput but in general, multi-threading is highly recommended in
> terms of indexing.
>
>
> On Mon, Apr 4, 2016 at 2:33 PM, Robert Brown <r...@intelcompute.com> wrote:
>
> > Hi,
> >
> > Does Solr have any sort of limit when attempting multiple updates, from
> > separate clients?
> >
> > Are there any safe thresholds one should try to stay within?
> >
> > I have an index of around 60m documents that gets updated at key points
> > during the day from ~200 downloaded files - I'd like to fork off multiple
> > processes to deal with the incoming data to get it into Solr quicker.
> >
> > Thanks,
> > Rob
> >
> >
> >
>
>
> --
> Anshum Gupta
>



-- 
Anshum Gupta


Re: Parallel Updates

2016-04-04 Thread Anshum Gupta
The short answer is - There's no real limit on Solr in terms of
concurrency.

Here are a few things that would impact your numbers though:
* What version of Solr are you using and how ? i.e. SolrCloud, standalone,
traditional replication ?
* Do you use atomic updates?
* How do you index ?

Assuming you are on SolrCloud, you wouldn't be able to have a dedicated
indexing node.

There are a ton of other settings you could read about and tweak to get
good throughput but in general, multi-threading is highly recommended in
terms of indexing.


On Mon, Apr 4, 2016 at 2:33 PM, Robert Brown <r...@intelcompute.com> wrote:

> Hi,
>
> Does Solr have any sort of limit when attempting multiple updates, from
> separate clients?
>
> Are there any safe thresholds one should try to stay within?
>
> I have an index of around 60m documents that gets updated at key points
> during the day from ~200 downloaded files - I'd like to fork off multiple
> processes to deal with the incoming data to get it into Solr quicker.
>
> Thanks,
> Rob
>
>
>


-- 
Anshum Gupta


Re: [possible bug]: [child] - ChildDocTransformerFactory returns top level documents nested under middle level documents when queried for the middle level ones

2016-03-30 Thread Anshum Gupta
I'm not the best person to comment on this so perhaps someone could chime
in as well, but can you try using a wildcard for your childFilter?
Something like: childFilter=type_s:doc.enriched.text.*

You could also possibly enrich the document with depth information and use
that for filtering out.

On Wed, Mar 30, 2016 at 11:34 AM, Alisa Z. <prol...@mail.ru> wrote:

>  I think I am observing an unexpected behavior of
> ChildDocTransformerFactory.
>
> The query is like this:
>
> /select?q={!parent which= "type_s:doc.enriched.text "}t
> ype_s:doc.enriched.text.entities  +text_t:pjm +type_t:Company
> +relevance_tf:[0.7%20TO%20*]=*,[child
> parentFilter=type_s:doc.enriched.text  limit=1000]
>
> The levels of hierarchy are shown in the  type_s field.  So I am querying
> on some descendants and returning some ancestors that are somewhere in the
> middle of the hierarchy. I also want to get all the nested documents
> below  that middle level.
>
> Here is the result:
>
> 
> 
>
>  doc.enriched.text// this is the level
> I wanted to get to and then go down from it
>  ... 
>  13565 
> 
>  doc.enriched   // This is a document
> from 1 level up, the parent of the
>// current  type_s :
> doc.enriched.text document -- why is it here?
>  22024 
> 
> 
>  doc.original   // This is an "uncle"
>  26698 
> 
> 
>  doc// and this a
> grandparent!!!
>
>
> 
>
> And so on, bringing the whole tree up and down all under my middle-level
> document.
> I really hope this is not the expected behavior.
>
> I appreciate your help in advance.
>
> --
> Alisa Zhila




-- 
Anshum Gupta


Re: Unable to create collection in 5.5

2016-03-28 Thread Anshum Gupta
I'm not sure why this would be a problem as older collections would
continue to work just fine. Do you mean that the restriction doesn't allow
you to e.g. add a shard with a valid name, to an older collection ?

On Mon, Mar 28, 2016 at 9:22 AM, Yago Riveiro <yago.rive...@gmail.com>
wrote:

> This kind of stuff can't be released without a way to rename the current
> collections with hyphens (even for 6.0)
>
>
>
> \--
>
>
>
> /Yago Riveiro
>
>
>
> ![](
> https://link.nylas.com/open/m7fkqw0yim04itb62itnp7r9/d6c3ba33ed5f4ac8af3b2
> 9c07e2c5e91)
>
> On Mar 28 2016, at 5:19 pm, Anshum Gupta ans...@anshumgupta.net
> wrote:
>
> > Yes, this was added in 5.5, though I think it shouldn't have been,
> specially the hyphens.
> The hyphen was added back as part of SOLR-8725 but it would only be would
> with 6.0 (and 5.5.1).
>
> >
>
> >
> On Mon, Mar 28, 2016 at 7:36 AM, Yago Riveiro yago.rive...@gmail.com
> 
> wrote:
>
> >
>
> >  Hi,
> 
>  With solr 5.5 I can't create a collection with the name collection-16,
> and
>  in 5.3.1 I can do it, Why?
> 
>  ?xml version="1.0" encoding="UTF-8"?
>  response
>  lst name="responseHeader"int
> name="status"400/intint
>  name="QTime"1/int/lstlst
> name="error"lst
> name="metadata"str
>  name="error-
> class"org.apache.solr.common.SolrException/strstr
> 
>  name="root-error-
>
> class"org.apache.solr.common.SolrException/str/lststr
>  name="msg"Invalid name: 'collection-16' Identifiers must consist
> entirely
>  of periods, underscores and alphanumerics/strint
>  name="code"400/int/lst
>  /response
> 
> 
> 
>  \-
>  Best regards
>  \--
>  View this message in context:
>  <http://lucene.472066.n3.nabble.com/Unable-to-create-collection-
> in-5-5-tp4266437.html>
>  Sent from the Solr - User mailing list archive at Nabble.com.
> 
>
> >
>
> > \--
> Anshum Gupta
>
>


-- 
Anshum Gupta


Re: Unable to create collection in 5.5

2016-03-28 Thread Anshum Gupta
Yes, this was added in 5.5, though I think it shouldn't have been,
specially the hyphens.
The hyphen was added back as part of SOLR-8725 but it would only be would
with 6.0 (and 5.5.1).


On Mon, Mar 28, 2016 at 7:36 AM, Yago Riveiro <yago.rive...@gmail.com>
wrote:

> Hi,
>
> With solr 5.5 I can't create a collection with the name collection-16, and
> in 5.3.1 I can do it, Why?
>
> 
> 
> 400 name="QTime">1 name="error-class">org.apache.solr.common.SolrException
> name="root-error-class">org.apache.solr.common.SolrException name="msg">Invalid name: 'collection-16' Identifiers must consist entirely
> of periods, underscores and alphanumerics name="code">400
> 
>
>
>
> -
> Best regards
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Unable-to-create-collection-in-5-5-tp4266437.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Anshum Gupta


Re: Re[2]: [nesting] Any way to return the whole hierarchical structure when doing Block Join queries?

2016-03-25 Thread Anshum Gupta
Hi Alisa,

The issue here is still open so it seems highly unlikely that it would even
get to 6.0, which is around the corner. I think this would only be out with
6.1 at the earliest.

On Fri, Mar 25, 2016 at 11:12 AM, Alisa Z. <prol...@mail.ru> wrote:

>  Mikhail,
> Thank you for the answer.
> I'd be happy to contribute tons of test cases on nested structures and
> their querying and faceting...
> I am working on a case of moving very nested data structures to Solr (and
> the other option is ES...) but so far Solr seems to be quite behind... It's
> great to see that it is moving in that direction though. I am happy to
> provide the use-cases (that are out of eCommerce actually) and publicly
> available test-cases.
>
> Is it correct that the patch will appear in a release version no sooner
> than Solr 6.0 or even later?
>
> Thanks,
> Alisa
>
> >Четверг, 24 марта 2016, 15:52 -04:00 от Mikhail Khludnev <
> mkhlud...@griddynamics.com>:
> >
> >I think you cal already kick tires and contribute a test case into
> >https://issues.apache.org/jira/browse/SOLR-8208 that's already reachable
> >there I believe, but I still working on core design.
> >
> >On Thu, Mar 24, 2016 at 10:02 PM, Alisa Z. < prol...@mail.ru > wrote:
> >
> >>  Hi all,
> >>
> >> I apologize for duplicating my previous message:
> >> Solr 5.3:  anything similar to ChildDocTransformerFactory  that does not
> >> flatten the hierarchical structure?
> >>
> >> However, it is still an open and interesting question:
> >>
> >> Following the example from
> https://dzone.com/articles/using-solr-49-new
> >> , let's say we are given multiple-level nested structure:
> >>
> >> 
> >> 1
> >> I am the parent
> >> PARENT
> >> 
> >> 1.1
> >> I am the 1st child
> >> CHILD
> >> 
> >> 
> >> 1.2
> >> I am the 2nd child
> >> CHILD
> >> 
> >> 1.2.1
> >> I am a grandchildren
> >> GRANDCHILD
> >> 
> >> 
> >> 
> >>
> >>
> >> Querying
> >> q={!parent which="cat:PARENT"}name:(I am +child)=id,name,[child
> >> parentFilter=cat:PARENT]
> >>
> >> will return flattened structure, where cat:CHILD and cat:GRANDCHILD
> >> documents end up on the same level:
> >> 
> >> 1
> >> I am the parent
> >> PARENT
> >> 
> >> 1.1
> >> I am the 1st child
> >> CHILD
> >> 
> >> 
> >> 1.2
> >> I am the 2nd child
> >> CHILD
> >> 
> >> 
> >> 1.2.1
> >> I am a grandchildren
> >> GRANDCHILD
> >> 
> >>  Indeed, the JAVAdocs for ChildDocTransformerFactory say: "This
> >> transformer returns all descendants of each parent document in a flat
> list
> >> nested inside the parent document".
> >>
> >> Yet is there any way to preserve the hierarchy in the response? I really
> >> need to find the way to preserve the structure in the response.
> >>
> >> Thank you in advance!
> >>
> >> --
> >> Alisa Zhila
> >> --
> >>
> >
> >
> >
> >--
> >Sincerely yours
> >Mikhail Khludnev
> >Principal Engineer,
> >Grid Dynamics
> >
> >< http://www.griddynamics.com >
> >< mkhlud...@griddynamics.com >
>
>


-- 
Anshum Gupta


Re: Solrj , how to create collection

2016-03-19 Thread Anshum Gupta
Are you running Solr in Cloud (ZooKeeper aware) mode ? If so, manual
creation of core is actually not something that is supported. It works, but
it's not supported.

Assuming you _are_ running in cloud mode, the answer to your question is
yes. Provided you upload the configuration to be used by SolrCloud before
the collection creation command.
It seems like we are lacking all the documentation for the same but you can
explore the subclasses in CollectionAdminRequest.

Here's an example how you'd do that in Solr 5.5:

CloudSolrClient cloudClient = new CloudSolrClient(zkHostString);

CollectionAdminRequest.Create req = new CollectionAdminRequest.Create();
CollectionAdminResponse response = req.setCollectionName("foo")
.setReplicationFactor(1)
.setConfigName("bar")
.process(cloudClient);

You can then parse the response.

Hope this helps.

On Sat, Mar 19, 2016 at 4:44 PM, Iana Bondarska <yana2...@gmail.com> wrote:

> Hi,
> Could you please tell me, is it possible to create new collection on solr
> server only using solrj,without manual creation of core folder on server.
> I'm using solrj v.5.5.0,standalone client.
>
> Thanks,
> Iana
>



-- 
Anshum Gupta


Re: Do all SolrCloud nodes communicate with the database when indexing a collection?

2016-02-18 Thread Anshum Gupta
I'd suggest using CloudSolrClient. It uses ConcurrentUpdateSolrClient under
the hood and is zk aware so it would route the documents from the Client to
your Solr nodes correctly, saving you an extra hop.
Another thing to remember here is to reuse the Solr client as it is
thread-safe.

Reading up about commits would also be useful and this blog by Erick
Erickson is a good place to learn about that:
https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

In terms of running SolrJ on each node, you could just run a single
multi-threaded indexer that gets data from your database and injects it
into Solr. This process would run outside of Solr and could potentially run
anywhere.

As far as routing goes, I suggest you just try the default composite id
router unless you hit issues there. If you do you could read up about how
routing in SolrCloud works here:
https://lucidworks.com/blog/2013/06/13/solr-cloud-document-routing/

and also about advanced concepts here:
https://lucidworks.com/blog/2014/01/06/multi-level-composite-id-routing-solrcloud/



On Thu, Feb 18, 2016 at 2:08 PM, Colin Freas <cfr...@stsci.edu> wrote:

>
> Thanks for the info, Anshum.
>
> Writing up a SolrJ program to do this is entirely within my wheelhouse.
>
> Read through some of the SolrJ docs and found some examples to start.
>
> A handful of questions if anyone has some pointers.
>
> 1. From a performance perspective, is it worth it to use
> ConcurrentUpdateSolrServer? Also, documentation says best for updates;
> does that include adding documents?
>
> 2. When I run the importer via my SolrJ program to distribute the
> indexing, I¹ll create some kind of Solr client within SolrJ and point them
> at zookeeper.  But the communication with the SQL Server db is independent
> of the communication with zookeeper, right?  In that case, is it
> possible/does it make sense to run the SolrJ program on each node, so that
> each node communicates with the DB but they¹re both communicating with zk?
>
> One more question: for document routing to specific shards, the particular
> documents I have don¹t really have a natural way for routing.  Even if
> they did, my intuition is that I want the documents randomly and evenly
> distributed across all the machines in the cluster that will perform the
> querying.  Or is that intuition wrong, and it¹s better to have documents
> that fit a search criteria sorted in some way and placed near each other
> on a single or small number of machines?
>
> Any insights much appreciated!
>
> -Colin
>
>
>
> On 2/18/16, 2:01 AM, "Anshum Gupta" <ans...@anshumgupta.net> wrote:
>
> >Hi Colin,
> >
> >As per when I last checked, DIH works with SolrCloud but has it's
> >limitations. It was designed for the non-cloud mode and is single
> >threaded.
> >It runs on whatever node you set it up on and that node might not host the
> >leader for the shard a document belongs to, adding an extra hop for those
> >documents.
> >
> >SolrCloud is designed for multi-threaded indexing and I'd highly recommend
> >you to use SolrJ to speed up your indexing. Yes, that would involve
> >writing
> >some code but it would speed things up considerably.
> >
> >
> >On Wed, Feb 17, 2016 at 10:51 PM, Colin Freas <cfr...@stsci.edu> wrote:
> >
> >>
> >> I just set up a SolrCloud instance with 2 Solr nodes & another machine
> >> running zookeeper.
> >>
> >> I¹ve imported 200M records from a SQL Server database, and those records
> >> are split nicely between the 2 nodes.  Everything seems ok.
> >>
> >> I did the data import via the admin ui.  It took not quite 8 hours,
> >>which
> >> I guess is fine.  So, in the middle of the import I checked to see what
> >>was
> >> connected to the SQL Server machine.  It turned out that only the node
> >>that
> >> I had started the import on was actually connected to my database
> >>server.
> >>
> >> Is that the expected behavior?  Is there any way to have all nodes of a
> >> SolrCloud index communicate with the database during the indexing?
> >>Would
> >> that speed up indexing?  Maybe this isn¹t a bottleneck I should be
> >>worried
> >> about.
> >>
> >> Thanks,
> >> -Colin
> >>
> >
> >
> >
> >--
> >Anshum Gupta
>
>


-- 
Anshum Gupta


Re: Do all SolrCloud nodes communicate with the database when indexing a collection?

2016-02-17 Thread Anshum Gupta
Hi Colin,

As per when I last checked, DIH works with SolrCloud but has it's
limitations. It was designed for the non-cloud mode and is single threaded.
It runs on whatever node you set it up on and that node might not host the
leader for the shard a document belongs to, adding an extra hop for those
documents.

SolrCloud is designed for multi-threaded indexing and I'd highly recommend
you to use SolrJ to speed up your indexing. Yes, that would involve writing
some code but it would speed things up considerably.


On Wed, Feb 17, 2016 at 10:51 PM, Colin Freas <cfr...@stsci.edu> wrote:

>
> I just set up a SolrCloud instance with 2 Solr nodes & another machine
> running zookeeper.
>
> I’ve imported 200M records from a SQL Server database, and those records
> are split nicely between the 2 nodes.  Everything seems ok.
>
> I did the data import via the admin ui.  It took not quite 8 hours, which
> I guess is fine.  So, in the middle of the import I checked to see what was
> connected to the SQL Server machine.  It turned out that only the node that
> I had started the import on was actually connected to my database server.
>
> Is that the expected behavior?  Is there any way to have all nodes of a
> SolrCloud index communicate with the database during the indexing?  Would
> that speed up indexing?  Maybe this isn’t a bottleneck I should be worried
> about.
>
> Thanks,
> -Colin
>



-- 
Anshum Gupta


Re: Request for SOLR-wiki edit permissions

2016-02-08 Thread Anshum Gupta
Done.

On Mon, Feb 8, 2016 at 9:55 AM, Jason Gerlowski <gerlowsk...@gmail.com>
wrote:

> Hi all,
>
> Can someone please give me edit permissions for the Solr wiki.  Is
> there anything I should or need to do to get these permissions?  My
> wiki username is "Jason.Gerlowski", and my wiki email is
> "gerlowsk...@gmail.com".
>
> I spotted a few things that could use some clarification on the
> HowToContribute page (https://wiki.apache.org/solr/HowToContribute)
> and wanted to make them a bit clearer.
>
> Jason
>



-- 
Anshum Gupta


[ANNOUNCE] Apache Solr 5.3.2 released

2016-01-23 Thread Anshum Gupta
23 January 2016, Apache Solr™ 5.3.2 available

The Lucene PMC is pleased to announce the release of Apache Solr 5.3.2

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search. Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the search
and navigation features of many of the world's largest internet sites.

This release contains various bug fixes and optimizations since the 5.3.1
release. The release is available for immediate download at:

  http://www.apache.org/dyn/closer.lua/lucene/solr/5.3.2

Please read CHANGES.txt for a full list of new features and changes:

  https://lucene.apache.org/solr/5_3_2/changes/Changes.html

Solr 5.3.2 includes 11 bug fixes and added support for configuring TTL
PKIAuthenticationPlugin's tokens.

See the CHANGES.txt file included with the release for a full list of
changes and further details.

Please report any feedback to the mailing lists (
http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases. It is possible that the mirror you are using may
not have replicated the release yet. If that is the case, please try
another mirror. This also goes for Maven access.

-- 
Anshum Gupta


Re: Rolling upgrade to 5.4 from 5.0 - "bug" caused by leader changes - is there a workaround?

2016-01-19 Thread Anshum Gupta
If you can wait, I'd suggest to be on the bug fix release. It should be out
around the weekend.

On Tue, Jan 19, 2016 at 1:48 PM, Michael Joyner <mich...@newsrx.com> wrote:

> ok,
>
> I just found the 5.4.1 RC2 download, it seems to work ok for a rolling
> upgrade.
>
> I will see about downgrading back to 5.4.0 afterwards to be on an official
> release ...
>
>
>
> On 01/19/2016 04:27 PM, Michael Joyner wrote:
>
>> Hello all,
>>
>> I downloaded 5.4 and started doing a rolling upgrade from a 5.0 solrcloud
>> cluster and discovered that there seems to be a compatibility issue where
>> doing a rolling upgrade from pre-5.4 which causes the 5.4 to fail with
>> unable to determine leader errors.
>>
>> Is there a work around that does not require taking the cluster down to
>> upgrade to 5.4? Should I just stay with 5.3 for now? I need to implement
>> programmatic schema changes in our collection via solrj, and based on what
>> I'm reading this is a very new feature and requires the latest (or near
>> latest) solrcloud.
>>
>> Thanks!
>>
>> -Mike
>>
>
>


-- 
Anshum Gupta


Re: Rolling upgrade to 5.4 from 5.0 - "bug" caused by leader changes - is there a workaround?

2016-01-19 Thread Anshum Gupta
Hi Mike,

This is a known issue and would be fixed with the upcoming 5.4.1.
Here's a link to the issue: https://issues.apache.org/jira/browse/SOLR-8561

On Tue, Jan 19, 2016 at 1:27 PM, Michael Joyner <mich...@newsrx.com> wrote:

> Hello all,
>
> I downloaded 5.4 and started doing a rolling upgrade from a 5.0 solrcloud
> cluster and discovered that there seems to be a compatibility issue where
> doing a rolling upgrade from pre-5.4 which causes the 5.4 to fail with
> unable to determine leader errors.
>
> Is there a work around that does not require taking the cluster down to
> upgrade to 5.4? Should I just stay with 5.3 for now? I need to implement
> programmatic schema changes in our collection via solrj, and based on what
> I'm reading this is a very new feature and requires the latest (or near
> latest) solrcloud.
>
> Thanks!
>
> -Mike
>



-- 
Anshum Gupta


Re: [More Like This] Query building

2015-12-29 Thread Anshum Gupta
Feel free to create a JIRA and put up a patch if you can.

On Tue, Dec 29, 2015 at 4:26 PM, Alessandro Benedetti <abenede...@apache.org
> wrote:

> Hi guys,
> While I was exploring the way we build the More Like This query, I
> discovered a part I am not convinced of :
>
>
>
> Let's see how we build the query :
> org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int)
>
> 1) we extract the terms from the interesting fields, adding them to a map :
>
> Map<String, Int> termFreqMap = new HashMap<>();
>
> *( we lose the relation field-> term, we don't know anymore where the term
> was coming ! )*
>
> org.apache.lucene.queries.mlt.MoreLikeThis#createQueue
>
> 2) we build the queue that will contain the query terms, at this point we
> connect again there terms to some field, but :
>
> ...
>> // go through all the fields and find the largest document frequency
>> String topField = fieldNames[0];
>> int docFreq = 0;
>> for (String fieldName : fieldNames) {
>>   int freq = ir.docFreq(new Term(fieldName, word));
>>   topField = (freq > docFreq) ? fieldName : topField;
>>   docFreq = (freq > docFreq) ? freq : docFreq;
>> }
>> ...
>
>
> We identify the topField as the field with the highest document frequency
> for the term t .
> Then we build the termQuery :
>
> queue.add(new ScoreTerm(word, *topField*, score, idf, docFreq, tf));
>
> In this way we lose a lot of precision.
> Not sure why we do that.
> I would prefer to keep the relation between terms and fields.
> The MLT query can improve a lot the quality.
> If i run the MLT on 2 fields : *description* and *facilities* for example.
> It is likely I want to find documents with similar terms in the
> description and similar terms in the facilities, without mixing up the
> things and loosing the semantic of the terms.
>
> Let me know your opinion,
>
> Cheers
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
Anshum Gupta


Re: Authorization API versus zkcli.sh

2015-12-11 Thread Anshum Gupta
yes, that's the assumption. The reason why there's a version there is to
optimize on reloads i.e. Authentication and Authorization plugins are
reloaded only when the version number is changed. e.g.
* Start with Ver 1 for both authentication and authorization
* Make changes to Authentication, the version for this section is updated
to the znode version, while the version for the authorization section is
not changed. This forces the authentication plugin to be reloaded but not
the authorization plugin. Similarly for authorization.

It's a way to optimize the reloads without splitting the definition into 2
znodes, which is also an option.


On Fri, Dec 11, 2015 at 8:06 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Shouldn't this be the znode version? Why put a version in
> security.json? Or is the idea that the user will upload security.json
> only once and then use the security APIs for all further changes?
>
> On Fri, Dec 11, 2015 at 11:51 AM, Noble Paul <noble.p...@gmail.com> wrote:
> > Please do not put any number. That number is used by the system to
> > optimize loading/reloading plugins. It is not relevant for the user.
> >
> > On Thu, Dec 10, 2015 at 11:52 PM, Oakley, Craig (NIH/NLM/NCBI) [C]
> > <craig.oak...@nih.gov> wrote:
> >> Looking at security.json in Zookeeper, I notice that both the
> authentication section and the authorization section ends with something
> like
> >>
> >> "":{"v":47}},
> >>
> >> Am I correct in thinking that this 47 (in this case) is a version
> number, and that ANY number could be used in the file uploaded to
> security.json using "zkcli.sh -putfile"?
> >>
> >> Or is this some sort of checksum whose value must match some unclear
> criteria?
> >>
> >>
> >> -Original Message-
> >> From: Anshum Gupta [mailto:ans...@anshumgupta.net]
> >> Sent: Sunday, December 06, 2015 8:42 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Authorization API versus zkcli.sh
> >>
> >> There's nothing cluster specific in security.json if you're using those
> >> plugins. It is totally safe to just take the file from one cluster and
> >> upload it for another for things to work.
> >>
> >> On Sat, Dec 5, 2015 at 3:38 AM, Oakley, Craig (NIH/NLM/NCBI) [C] <
> >> craig.oak...@nih.gov> wrote:
> >>
> >>> Looking through
> >>>
> cwiki.apache.org/confluence/display/solr/Authentication+and+Authorization+Plugins
> >>> one notices that security.json is initially created by zkcli.sh, and
> then
> >>> modified by means of the Authentication API and the Authorization API.
> By
> >>> and large, this sounds like a good way to accomplish such tasks,
> assuming
> >>> that these APIs do some error checking to prevent corruption of
> >>> security.json
> >>>
> >>> I was wondering about cases where one is cloning an existing Solr
> >>> instance, such as when creating an instance in Amazon Cloud. If one
> has a
> >>> security.json that has been thoroughly tried and successfully tested on
> >>> another Solr instance, is it possible / safe / not-un-recommended to
> use
> >>> zkcli.sh to load the full security.json (as extracted via zkcli.sh
> from the
> >>> Zookeeper of the thoroughly tested existing instance)? Or would the
> >>> official verdict be that the only acceptable way to create
> security.json is
> >>> to load a minimal version with zkcli.sh and then to build the remaining
> >>> components with the Authentication API and the Authorization API (in a
> >>> script, if one wants to automate the process: although such a script
> would
> >>> have to include plain-text passwords)?
> >>>
> >>> I figured there is no harm in asking.
> >>>
> >>
> >>
> >>
> >> --
> >> Anshum Gupta
> >
> >
> >
> > --
> > -
> > Noble Paul
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Anshum Gupta


Re: Authorization API versus zkcli.sh

2015-12-06 Thread Anshum Gupta
There's nothing cluster specific in security.json if you're using those
plugins. It is totally safe to just take the file from one cluster and
upload it for another for things to work.

On Sat, Dec 5, 2015 at 3:38 AM, Oakley, Craig (NIH/NLM/NCBI) [C] <
craig.oak...@nih.gov> wrote:

> Looking through
> cwiki.apache.org/confluence/display/solr/Authentication+and+Authorization+Plugins
> one notices that security.json is initially created by zkcli.sh, and then
> modified by means of the Authentication API and the Authorization API. By
> and large, this sounds like a good way to accomplish such tasks, assuming
> that these APIs do some error checking to prevent corruption of
> security.json
>
> I was wondering about cases where one is cloning an existing Solr
> instance, such as when creating an instance in Amazon Cloud. If one has a
> security.json that has been thoroughly tried and successfully tested on
> another Solr instance, is it possible / safe / not-un-recommended to use
> zkcli.sh to load the full security.json (as extracted via zkcli.sh from the
> Zookeeper of the thoroughly tested existing instance)? Or would the
> official verdict be that the only acceptable way to create security.json is
> to load a minimal version with zkcli.sh and then to build the remaining
> components with the Authentication API and the Authorization API (in a
> script, if one wants to automate the process: although such a script would
> have to include plain-text passwords)?
>
> I figured there is no harm in asking.
>



-- 
Anshum Gupta


Re: How to config security.json?

2015-12-05 Thread Anshum Gupta
.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClien
> t.java:528)
>
>  at
>
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java
> :234)
>
>  at
>
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java
> :226)
>
>  at
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
>
>  at
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:152)
>
>  at
>
> org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:
> 207)
>
>  at
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:147)
>
>  at
>
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437)
>
>  at
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)
>
> 1531614 ERROR (qtp5264648-21) [c:gettingstarted s:shard1 r:core_node1
> x:gettingstarted_shard1_replica2] o.a.s.h.a.ShowFileRequestHandler Can not
> find: /configs/gettingstarted/admin-extra.menu-top.html
>
> 1531614 ERROR (qtp5264648-16) [c:gettingstarted s:shard1 r:core_node1
> x:gettingstarted_shard1_replica2] o.a.s.h.a.ShowFileRequestHandler Can not
> find: /configs/gettingstarted/admin-extra.menu-bottom.html
>
> 1531661 ERROR (qtp5264648-14) [c:gettingstarted s:shard1 r:core_node1
> x:gettingstarted_shard1_replica2] o.a.s.h.a.ShowFileRequestHandler Can not
> find: /configs/gettingstarted/admin-extra.html
>
> ..
>
>
>
> Kind regards,
>
> Byzen Ma
>
>


-- 
Anshum Gupta


Re: Re:Re: Implementing security.json is breaking ADDREPLICA

2015-11-30 Thread Anshum Gupta
Hi Craig,

As part of my manual testing for the 5.3 RC, I tried out collection admin
request restriction and update restriction on a single node setup. I don't
have the manual test steps documented but it wasn't too intensive I'd
admit. I think the complications involved in stopping specific nodes and
bringing them back up stop us from testing the node restarts as part of the
automated tests but we should find a way and fix that.

I've just found another issue and opened SOLR-8355 for the same and it
involves the "update" permission.

As far as patching 5.3.1 go, it's involves more than just this one patch
and this patch alone wouldn't help you resolve this issue. You'd certainly
need the patch from SOLR-8167. Also, make sure you actually use the
'commit' and not the posted patch as the patch on SOLR-8167 is different
from the commit. I don't think you'd need anything other than those patches
and whatever comes from 8355 to have a patched 5.3.1.

Any help in testing this out would be awesome and thanks for reporting and
following up on the issues!


On Tue, Dec 1, 2015 at 6:09 AM, Oakley, Craig (NIH/NLM/NCBI) [C] <
craig.oak...@nih.gov> wrote:

> Thank you, Anshum and Nobel, for your progress on SOLR-8326
>
> I have a couple questions to tide me over until 5.4 (hoping to test
> security.json a bit further while I wait).
>
> Given that the seven steps (tar xvzf solr-5.3.1.tgz; tar xvzf
> zookeeper-3.4.6.tar.gz; zkServer.sh start zoo_sample.cfg; zkcli.sh -zkhost
> localhost:2181 -cmd putfile /security.json ~/security.json; solr start -e
> cloud -z localhost:2181; solr stop -p 7574 & solr start -c -p 7574 -s
> "example/cloud/node2/solr" -z localhost:2181) demonstrate the problem, are
> there a similar set of steps by which one can load _some_ minimal
> security.json and still be able to stop & successfully restart one node of
> the cluster? (I am wondering what steps were used in the original testing
> of 5.3.1)
>
> Also, has it been verified that the SOLR-8326 patch resolves the
> ADDREPLICA bug in addition to the
> shutdown-&-restart-one-node-while-keeping-another-node-running bug?
>
> Also, would it make sense for me to download solr-5.3.1-src.tgz and (in a
> test environment) make the changes described in the latest attachment to
> SOLR-8326? Or would it be more advisable just to wait for 5.4? I don't know
> what may be involved in compiling a new solr.war from the source code.
>
> Thanks again
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Tuesday, November 24, 2015 1:25 PM
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Re:Re: Implementing security.json is breaking ADDREPLICA
>
> bq: I don't suppose there is an ETA for 5.4?
>
> Actually, 5.4 is probably in the works within the next month. I'm not
> the one cutting the
> release, but there's some rumors that a label will be cut this week,
> then the "usual"
> process is a week or two (sometimes more if bugs are flushed out) before
> the
> official release.
>
> Call it the first of the year for safety's sake, but that's a guess.
>
> Best,
> Erick
>
> On Tue, Nov 24, 2015 at 10:22 AM, Oakley, Craig (NIH/NLM/NCBI) [C]
> <craig.oak...@nih.gov> wrote:
> > Thanks for the reply,
> >
> > I don't suppose there is an ETA for 5.4?
> >
> >
> > Thanks again
> >
> > -Original Message-
> ...
>



-- 
Anshum Gupta


Re: Re:Re: Implementing security.json is breaking ADDREPLICA

2015-11-24 Thread Anshum Gupta
Yes, it certainly is a PKI issue.

On Tue, Nov 24, 2015 at 7:59 AM, Oakley, Craig (NIH/NLM/NCBI) [C] <
craig.oak...@nih.gov> wrote:

> Thank you for the reply
>
> Trying those exact commands, I'm still getting the same issue
> tar xvzf /net/sybdev11/export/home/sybase/Distr/Solr/solr-5.3.1.tgz
> tar xvzf /net/sybdev11/export/home/sybase/Distr/Solr/zookeeper-3.4.6.tar.gz
> cd zookeeper-3.4.6/
> bin/zkServer.sh start zoo_sample.cfg
> cd ..
> solr-5.3.1/server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:2181
> -cmd putfile /security.json
> PREVsolr-5.3.1/server/scripts/cloud-scripts/security.json
> solr-5.3.1/server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:2181
> -cmd list
> solr-5.3.1/bin/solr start -e cloud -z localhost:2181
> cd solr-5.3.1/
> bin/solr stop -p 7574
> bin/solr start -c -p 7574 -s "example/cloud/node2/solr" -z localhost:2181
> tail -f example/cloud/node2/logs/solr.log
>
> The -cmd list shows
> / (2)
> DATA:
>
>  /zookeeper (1)
>  DATA:
>
>  /security.json (0)
>  DATA:
>
>  
> {"authentication":{"class":"solr.BasicAuthPlugin","credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0=
> Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}},"authorization":{"class":"solr.RuleBasedAuthorizationPlugin","user-role":
>  {"solr":["admin"]}
>
>  ,"permissions":[
>  {"name":"security-edit","role":"admin"}
>
>  ]}}
>
>
> While the output of tail contains
> ERROR - 2015-11-24 10:45:54.796; [c:gettingstarted s:shard1 r:core_node4
> x:gettingstarted_shard1_replica1] org.apache.solr.common.SolrException;
> Error while trying to recover.:java.util.concurrent.ExecutionException:
> org.apache.http.ParseException: Invalid content type:
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at
> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:598)
> at
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:361)
> at
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)
> Caused by: org.apache.http.ParseException: Invalid content type:
> at org.apache.http.entity.ContentType.parse(ContentType.java:273)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:512)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:270)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:266)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
>
> -Original Message-
> From: Anshum Gupta [mailto:ans...@anshumgupta.net]
> Sent: Monday, November 23, 2015 7:24 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Re:Re: Implementing security.json is breaking ADDREPLICA
>
> Yes, I see the same issue. I'll update the JIRA and drill down. Thanks.
>
> On Mon, Nov 23, 2015 at 4:18 PM, Anshum Gupta <ans...@anshumgupta.net>
> wrote:
>
> > To restart solr, you should instead use something like:
> > bin/solr start -c -p 8983 -s "example/cloud/node1/solr" -z localhost:2181
> > or
> > bin/solr start -c -p 7574 -s "example/cloud/node2/solr" -z localhost:2181
> >
> > I've seen others report the same exception but never ran into this one
> > myself. Let me try this out.
> >
> >
> >
> > On Mon, Nov 23, 2015 at 2:55 PM, Oakley, Craig (NIH/NLM/NCBI) [C] <
> > craig.oak...@nih.gov> wrote:
> >
> >> FWIW
> >>
> >> I am getting fairly consistent results that if I follow the SOLR-8326
> >> procedure just up through the step of "solr-5.3.1/bin/solr start -e
> cloud
> >> -z localhost:2181": if I then stop just one node (either "./solr stop -p
> >> 7574" or "./solr stop -p 8983") and then restart that same node (using
> the
> >> command suggested by "solr-5.3.1/bin/solr start -e cloud -z
> >> localhost:2181"), then the solr.log for the stopped-and-restarted node
> gets
> >&

Re: Re:Re: Implementing security.json is breaking ADDREPLICA

2015-11-23 Thread Anshum Gupta
  at org.eclipse.jetty.server.Server.handle(Server.java:499)
> at
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
> at
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
> at java.lang.Thread.run(Thread.java:745)
>
> In this case the string is just "r?", but usually it is a longer string of
> control characters.
>
> If I shutdown _both_ nodes and restart _one_, and then allow it to be
> "Waiting until we see more replicas up" until it recognizes itself as
> leader, and _then_ restart the other node -- in this case it successfully
> starts.
>
> Is there some necessary environment tweaking? The symptoms seem similar
> whether I use the security.json from SOLR-8326 or the security.json from
> the Wiki (with the comma repositioned).
>
>
>
> -Original Message-
> From: Oakley, Craig (NIH/NLM/NCBI) [C]
> Sent: Friday, November 20, 2015 6:59 PM
> To: 'solr-user@lucene.apache.org' <solr-user@lucene.apache.org>
> Subject: RE: Re:Re: Implementing security.json is breaking ADDREPLICA
>
> Thanks
>
> It seems to work when there is no security.json, so perhaps there's some
> typo in the initial version.
>
> I notice that the version you sent is different from the documentation at
> cwiki.apache.org/confluence/display/solr/Authentication+and+Authorization+Plugins
> in that the Wiki version has "permissions" before "user-role": I also
> notice that (at least as of right this moment) the Wiki version has a comma
> at the end of '"user-role":{"solr":"admin"},' even though it is at the end:
> and I notice that the Wiki version seems to lack a comma between the
> "permissions" section and the "user-role" section. I just now also noticed
> that the version you sent has '"user-role":{"solr":["admin"]}' (with square
> brackets) whereas the Wiki does not have square brackets.
>
> The placement of the comma definitely looks wrong in the Wiki at the
> moment (though perhaps someone might correct the Wiki before too long).
> Other than that, I don’t know whether the order and/or the square brackets
> make a difference. I can try with different permutations.
>
> Thanks again
>
> P.S. for the record, the Wiki currently has
> {
> "authentication":{
>"class":"solr.BasicAuthPlugin",
>"credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0=
> Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}
> },
> "authorization":{
>"class":"solr.RuleBasedAuthorizationPlugin",
>"permissions":[{"name":"security-edit",
>   "role":"admin"}]
>"user-role":{"solr":"admin"},
> }}
>
> -Original Message-
> From: Anshum Gupta [mailto:ans...@anshumgupta.net]
> Sent: Friday, November 20, 2015 6:18 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Re:Re: Implementing security.json is breaking ADDREPLICA
>
> This seems unrelated and more like a user error somewhere. Can you just
> follow the steps, without any security settings i.e. not even uploading
> security.json and see if you still see this? Sorry, but I don't have access
> to the code right now, I try and look at this later tonight.
>
> On Fri, Nov 20, 2015 at 3:07 PM, Oakley, Craig (NIH/NLM/NCBI) [C] <
> craig.oak...@nih.gov> wrote:
>
> > Thank you for opening SOLR-8326
> >
> > As a side note, in the procedure you listed, even before adding the
> > collection-admin-edit authorization, I'm already hitting trouble:
> stopping
> > and restarting a node results in the following
> >
> > INFO  - 2015-11-20 22:48:41.275; [c:solr8326 s:shard2 r:core_node4
> > x:solr8326_shard2_replica1] org.apache.solr.cloud.RecoveryStrategy;
> > Publishing state of core solr8326_shard2_replica1 as recovering, leader
> is
> > http://{IP-address-redacted}:8983/solr/solr8326_shard2_replica2/ and I
> am
> > http://{IP-address-redacted}:7574/solr/solr8326_shard2_replica1/
> > INFO  - 2015-11-20 22:48:41.275; [c:solr8326 s:shard2 r:core_node4
> > x:solr8326_shard2_replica1] org.apache.solr.cloud.ZkController;
> publishing
> > state=recovering
> > INFO  - 2015-11-20 22:48:41.278; [c:solr8326 s:shard1 r:core_node3
> > x:sol

Re: Re:Re: Implementing security.json is breaking ADDREPLICA

2015-11-23 Thread Anshum Gupta
Yes, I see the same issue. I'll update the JIRA and drill down. Thanks.

On Mon, Nov 23, 2015 at 4:18 PM, Anshum Gupta <ans...@anshumgupta.net>
wrote:

> To restart solr, you should instead use something like:
> bin/solr start -c -p 8983 -s "example/cloud/node1/solr" -z localhost:2181
> or
> bin/solr start -c -p 7574 -s "example/cloud/node2/solr" -z localhost:2181
>
> I've seen others report the same exception but never ran into this one
> myself. Let me try this out.
>
>
>
> On Mon, Nov 23, 2015 at 2:55 PM, Oakley, Craig (NIH/NLM/NCBI) [C] <
> craig.oak...@nih.gov> wrote:
>
>> FWIW
>>
>> I am getting fairly consistent results that if I follow the SOLR-8326
>> procedure just up through the step of "solr-5.3.1/bin/solr start -e cloud
>> -z localhost:2181": if I then stop just one node (either "./solr stop -p
>> 7574" or "./solr stop -p 8983") and then restart that same node (using the
>> command suggested by "solr-5.3.1/bin/solr start -e cloud -z
>> localhost:2181"), then the solr.log for the stopped-and-restarted node gets
>> such stack traces as
>> ERROR - 2015-11-23 21:49:28.663; [c:gettingstarted s:shard2 r:core_node3
>> x:gettingstarted_shard2_replica2] org.apache.solr.common.SolrException;
>> Error while trying to recover.:java.util.concurrent.ExecutionException:
>> org.apache.http.ParseException: Invalid content type:
>> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>> at
>> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:598)
>> at
>> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:361)
>> at
>> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)
>> Caused by: org.apache.http.ParseException: Invalid content type:
>> at org.apache.http.entity.ContentType.parse(ContentType.java:273)
>> at
>> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:512)
>> at
>> org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:270)
>> at
>> org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:266)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at
>> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> While the node which stayed up the whole time starts getting such stack
>> traces as
>> ERROR - 2015-11-23 21:57:46.019; [c:gettingstarted s:shard2 r:core_node3
>> x:gettingstarted_shard2_replica2]
>> org.apache.solr.security.PKIAuthenticationPlugin; Invalid time r?
>> java.lang.NumberFormatException: For input string: "r?"
>> at
>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>> at java.lang.Long.parseLong(Long.java:589)
>> at java.lang.Long.parseLong(Long.java:631)
>> at
>> org.apache.solr.security.PKIAuthenticationPlugin.doAuthenticate(PKIAuthenticationPlugin.java:128)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.authenticateRequest(SolrDispatchFilter.java:252)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:186)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
>> at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>> at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>> at
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>> at
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>> at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>> at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>> at
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>> at
>&

Re: Re:Re: Implementing security.json is breaking ADDREPLICA

2015-11-20 Thread Anshum Gupta
>From my tests, it seems like the 'read' permission interferes with the
Replication and so the ADDREPLICA also fails. You're also bound to run into
issues if you have 'read' permission setup and restart your cluster,
provided you have a collection that has a replication factor > 1 for at
least one shard.

I'll create a JIRA for this and mark it to be a blocker for 5.4. Thanks for
bringing this up.


On Thu, Nov 19, 2015 at 12:43 PM, Anshum Gupta <ans...@anshumgupta.net>
wrote:

> I'll try out what you did later in the day, as soon as I get time but why
> exactly are you creating cores manually? Seems like you manually create a
> core and the try to add a replica. Can you try using the Collections API to
> create a collection?
>
> Starting Solr 5.0, the only supported way to create a new collection is
> via the Collections API. Creating a core would lead to a collection
> creation but that's not really supported. It was just something that was
> done when there were no Collections API.
>
>
> On Thu, Nov 19, 2015 at 12:36 PM, Oakley, Craig (NIH/NLM/NCBI) [C] <
> craig.oak...@nih.gov> wrote:
>
>> I tried again with the following security.json, but the results were the
>> same:
>>
>> {
>>   "authentication":{
>> "class":"solr.BasicAuthPlugin",
>> "credentials":{
>>   "solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0=
>> Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c=",
>>   "solruser":"VgZX1TAMNHT2IJikoGdKtxQdXc+MbNwfqzf89YqcLEE=
>> 37pPWQ9v4gciIKHuTmFmN0Rv66rnlMOFEWfEy9qjJfY="},
>> "":{"v":9}},
>>   "authorization":{
>> "class":"solr.RuleBasedAuthorizationPlugin",
>> "user-role":{
>>   "solr":[
>> "admin",
>> "read",
>> "xmpladmin",
>> "xmplgen",
>> "xmplsel"],
>>   "solruser":[
>> "read",
>> "xmplgen",
>> "xmplsel"]},
>> "permissions":[
>>   {
>> "name":"security-edit",
>> "role":"admin"},
>>   {
>> "name":"xmpl_admin",
>> "collection":"xmpl",
>> "path":"/admin/*",
>> "role":"xmpladmin"},
>>   {
>> "name":"xmpl_sel",
>> "collection":"xmpl",
>> "path":"/select/*",
>> "role":null},
>>   {
>>  "name":"all-admin",
>>  "collection":null,
>>  "path":"/*",
>>  "role":"xmplgen"},
>>   {
>>  "name":"all-core-handlers",
>>  "path":"/*",
>>  "role":"xmplgen"}],
>> "":{"v":42}}}
>>
>> -Original Message-
>> From: Oakley, Craig (NIH/NLM/NCBI) [C]
>> Sent: Thursday, November 19, 2015 1:46 PM
>> To: 'solr-user@lucene.apache.org' <solr-user@lucene.apache.org>
>> Subject: RE: Re:Re: Implementing security.json is breaking ADDREPLICA
>>
>> I note that the thread called "Security Problems" (most recent post by
>> Nobel Paul) seems like it may help with much of what I'm trying to do. I
>> will see to what extent that may help.
>>
>
>
>
> --
> Anshum Gupta
>



-- 
Anshum Gupta


Re: shard range is empty...

2015-11-20 Thread Anshum Gupta
This uses the Collections API and shouldn't have led to that state. Have
you had similar issues before?

I'm also wondering if you already had something from previous runs/installs
on the fs/zk.

On Fri, Nov 20, 2015 at 10:26 AM, Don Bosco Durai <bo...@apache.org> wrote:

> Anshum,
>
> Thanks for the workaround. It resolved my issue.
>
> Here is the command I used. It is pretty standard and has worked for me
> almost all the time (so far)...
> bin/solr create -c my_collection -d
> /tmp/solr_configsets/my_collection/conf -s 3 -rf 1
>
>
> Thanks
>
> Bosco
>
>
>
>
>
> On 11/20/15, 9:56 AM, "Anshum Gupta" <ans...@anshumgupta.net> wrote:
>
> >You can manually update the cluster state so that the range for shard1
> says
> >8000-d554. Also remove the "parent" tag from there.
> >
> >Can you tell me how did you create this collection ? This shouldn't really
> >happen unless you didn't use the Collections API to create the collection.
> >
> >
> >
> >
> >
> >On Fri, Nov 20, 2015 at 9:39 AM, Don Bosco Durai <bo...@apache.org>
> wrote:
> >
> >> I created a 3 shard cluster, but seems for one of the shard, the range
> is
> >> empty. Anyway to fix it without deleting and recreating the collection?
> >>
> >> 2015-11-20 08:59:50,901 [solr,writer=0] ERROR
> >> apache.solr.client.solrj.impl.CloudSolrClient
> (CloudSolrClient.java:902) -
> >> Request to collection my_collection failed due to (400)
> >> org.apache.solr.common.SolrException: No active slice servicing hash
> code
> >> b637e7f1 in DocCollection(my_collection)={
> >>   "replicationFactor":"1",
> >>   "shards":{
> >> "shard2":{
> >>   "range":"d555-2aa9",
> >>   "state":"active",
> >>   "replicas":{"core_node2":{
> >>   "core":"my_collection_shard2_replica1",
> >>   "base_url":"http://172.22.64.65:8886/solr;,
> >>   "node_name":"172.22.64.65:8886_solr",
> >>   "state":"active",
> >>   "leader":"true"}}},
> >> "shard3":{
> >>   "range":"2aaa-7fff",
> >>   "state":"active",
> >>   "replicas":{"core_node3":{
> >>   "core":"my_collection_shard3_replica1",
> >>   "base_url":"http://172.22.64.64:8886/solr;,
> >>   "node_name":"172.22.64.64:8886_solr",
> >>   "state":"active",
> >>   "leader":"true"}}},
> >> "shard1":{
> >>   "parent":null,
> >>   "range":null,
> >>   "state":"active",
> >>   "replicas":{"core_node4":{
> >>   "core":"my_collection_shard1_replica1",
> >>   "base_url":"http://172.22.64.63:8886/solr;,
> >>   "node_name":"172.22.64.63:8886_solr",
> >>   "state":"active",
> >>   "leader":"true",
> >>   "router":{"name":"compositeId"},
> >>   "maxShardsPerNode":"1",
> >>   "autoAddReplicas":"false"}, retry? 0
> >>
> >> Thanks
> >>
> >> Bosco
> >>
> >>
> >>
> >
> >
> >--
> >Anshum Gupta
>
>


-- 
Anshum Gupta


Re: Re:Re: Implementing security.json is breaking ADDREPLICA

2015-11-20 Thread Anshum Gupta
core_node2
> x:xmpl3_shard1_replica2] org.apache.solr.cloud.RecoveryStrategy; Starting
> Replication Recovery.
> INFO  - 2015-11-20 16:56:25.284; [c:xmpl3 s:shard1 r:core_node2
> x:xmpl3_shard1_replica2] org.apache.solr.cloud.RecoveryStrategy; Begin
> buffering updates.
> INFO  - 2015-11-20 16:56:25.284; [c:xmpl3 s:shard1 r:core_node2
> x:xmpl3_shard1_replica2] org.apache.solr.update.UpdateLog; Starting to
> buffer updates. FSUpdateLog{state=ACTIVE, tlog=null}
> INFO  - 2015-11-20 16:56:25.284; [c:xmpl3 s:shard1 r:core_node2
> x:xmpl3_shard1_replica2] org.apache.solr.cloud.RecoveryStrategy; Attempting
> to replicate from http://
> {IP-address-redacted}:4685/solr/xmpl3_shard1_replica1/.
> ERROR - 2015-11-20 16:56:25.292; [c:xmpl3 s:shard1 r:core_node2
> x:xmpl3_shard1_replica2] org.apache.solr.common.SolrException; Error while
> trying to
> recover:org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> Error from server at 
> http://{IP-address-redacted}:4685/solr/xmpl3_shard1_replica1:
> Expected mime type application/octet-stream but got text/html. 
> 
> 
> Error 401 Unauthorized request, Response code: 401
> 
> HTTP ERROR 401
> Problem accessing /solr/xmpl3_shard1_replica1/update. Reason:
> Unauthorized request, Response code:
> 401Powered by Jetty://
>
> 
> 
>
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:528)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:234)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:226)
> at
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
> at
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:152)
> at
> org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:207)
> at
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:147)
> at
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437)
> at
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)
>
> INFO  - 2015-11-20 16:56:25.292; [c:xmpl3 s:shard1 r:core_node2
> x:xmpl3_shard1_replica2] org.apache.solr.update.UpdateLog; Dropping
> buffered updates FSUpdateLog{state=BUFFERING, tlog=null}
> ERROR - 2015-11-20 16:56:25.293; [c:xmpl3 s:shard1 r:core_node2
> x:xmpl3_shard1_replica2] org.apache.solr.cloud.RecoveryStrategy; Recovery
> failed - trying again... (2)
> INFO  - 2015-11-20 16:56:25.293; [c:xmpl3 s:shard1 r:core_node2
> x:xmpl3_shard1_replica2] org.apache.solr.cloud.RecoveryStrategy; Wait 8.0
> seconds before trying to recover again (3)
>
>
> Below is a list of the steps I took.
>
> ./zkcli.sh --zkhost localhost:4545 -cmd makepath /solr/xmpl3
> ./zkcli.sh --zkhost localhost:4545/solr/xmpl3 -cmd putfile /security.json
> ~/solr/security151119a.json
> ./zkcli.sh --zkhost localhost:4545/solr/xmpl3 -cmd upconfig -confdir
> ../../solr/configsets/basic_configs/conf -confname xmpl3
> cd ../../../bin/
> ./solr -c -p 4695 -d ~dbman/solr/straight531outofbox/solr-5.3.1/server/ -z
> localhost:4545/solr/xmpl3 -s
> ~dbman/solr/straight531outofbox/solr-5.3.1/example/solr
> ./solr -c -p 4685 -d ~dbman/solr/straight531outofbox/solr-5.3.1/server/ -z
> localhost:4545/solr/xmpl3 -s
> ~dbman/solr/straight531outofbox/solr-5.3.1/server/solr
> curl -u solr:SolrRocks '
> http://nosqltest11:4685/solr/admin/collections?action=CREATE=xmpl3=1=1={IP-address-redacted}:4685_solr
> '
> curl -u solr:SolrRocks '
> http://nosqltest11:4685/solr/admin/collections?action=ADDREPLICA=xmpl3=shard1={IP-address-redacted}:4695_solr=json=true
> '
>
>
>
>
> Can you provide a list of steps to take in an out-of-the-box directory
> tree whereby ADDREPLICA _will_ work with security.json already in place?
>
>
>
>
> -Original Message-
> From: Anshum Gupta [mailto:ans...@anshumgupta.net]
> Sent: Thursday, November 19, 2015 3:44 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Re:Re: Implementing security.json is breaking ADDREPLICA
>
> I'll try out what you did later in the day, as soon as I get time but why
> exactly are you creating cores manually? Seems like you manually create a
> core and the try to add a replica. Can you try using the Collections API to
> create a collection?
>
> Starting Solr 5.0, the only supported way to create a new collection is via
> the Collections API. Creating a core would lead to a collection creation
> but that's not really supported. It was just something that was done when
> there were no Collections API.
>
>
> On Thu, Nov 19, 2015 at 12:36 PM, Oakley, Craig (NIH/NLM/NCBI) [C] &l

Re: shard range is empty...

2015-11-20 Thread Anshum Gupta
You can manually update the cluster state so that the range for shard1 says
8000-d554. Also remove the "parent" tag from there.

Can you tell me how did you create this collection ? This shouldn't really
happen unless you didn't use the Collections API to create the collection.





On Fri, Nov 20, 2015 at 9:39 AM, Don Bosco Durai <bo...@apache.org> wrote:

> I created a 3 shard cluster, but seems for one of the shard, the range is
> empty. Anyway to fix it without deleting and recreating the collection?
>
> 2015-11-20 08:59:50,901 [solr,writer=0] ERROR
> apache.solr.client.solrj.impl.CloudSolrClient (CloudSolrClient.java:902) -
> Request to collection my_collection failed due to (400)
> org.apache.solr.common.SolrException: No active slice servicing hash code
> b637e7f1 in DocCollection(my_collection)={
>   "replicationFactor":"1",
>   "shards":{
> "shard2":{
>   "range":"d555-2aa9",
>   "state":"active",
>   "replicas":{"core_node2":{
>   "core":"my_collection_shard2_replica1",
>   "base_url":"http://172.22.64.65:8886/solr;,
>   "node_name":"172.22.64.65:8886_solr",
>   "state":"active",
>   "leader":"true"}}},
> "shard3":{
>   "range":"2aaa-7fff",
>   "state":"active",
>   "replicas":{"core_node3":{
>   "core":"my_collection_shard3_replica1",
>   "base_url":"http://172.22.64.64:8886/solr;,
>   "node_name":"172.22.64.64:8886_solr",
>   "state":"active",
>   "leader":"true"}}},
> "shard1":{
>   "parent":null,
>   "range":null,
>   "state":"active",
>   "replicas":{"core_node4":{
>   "core":"my_collection_shard1_replica1",
>   "base_url":"http://172.22.64.63:8886/solr;,
>   "node_name":"172.22.64.63:8886_solr",
>   "state":"active",
>   "leader":"true",
>   "router":{"name":"compositeId"},
>   "maxShardsPerNode":"1",
>   "autoAddReplicas":"false"}, retry? 0
>
> Thanks
>
> Bosco
>
>
>


-- 
Anshum Gupta


Re: Re:Re: Implementing security.json is breaking ADDREPLICA

2015-11-20 Thread Anshum Gupta
trying again... (4)
> INFO  - 2015-11-20 22:48:41.300; [c:solr8326 s:shard2 r:core_node4
> x:solr8326_shard2_replica1] org.apache.solr.cloud.RecoveryStrategy; Wait
> 32.0 seconds before trying to recover again (5)
> ERROR - 2015-11-20 22:48:41.300; [c:solr8326 s:shard1 r:core_node3
> x:solr8326_shard1_replica1] org.apache.solr.common.SolrException; Error
> while trying to recover.:java.util.concurrent.ExecutionException:
> org.apache.http.ParseException: Invalid content type:
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at
> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:598)
> at
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:361)
> at
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)
> Caused by: org.apache.http.ParseException: Invalid content type:
> at org.apache.http.entity.ContentType.parse(ContentType.java:273)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:512)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:270)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:266)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> ERROR - 2015-11-20 22:48:41.318; [c:solr8326 s:shard1 r:core_node3
> x:solr8326_shard1_replica1] org.apache.solr.cloud.RecoveryStrategy;
> Recovery failed - trying again... (4)
> INFO  - 2015-11-20 22:48:41.318; [   ]
> org.apache.solr.common.cloud.ZkStateReader; Updating data for solr8326 to
> ver 26
> INFO  - 2015-11-20 22:48:41.319; [c:solr8326 s:shard1 r:core_node3
> x:solr8326_shard1_replica1] org.apache.solr.cloud.RecoveryStrategy; Wait
> 32.0 seconds before trying to recover again (5)
>
>
> I would not be surprised if this were to be some unrelated issue (the
> symptoms are quite different)
>
>
>
> Thanks again
>
>
> -Original Message-
> From: Anshum Gupta [mailto:ans...@anshumgupta.net]
> Sent: Friday, November 20, 2015 1:31 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Re:Re: Implementing security.json is breaking ADDREPLICA
>
> Collections API were available before November of 2014, if that is when you
> took the class. However, it was only with Solr 5.0 (released in Feb 2015)
> that the only supported mechanism to create a collection was restricted to
> Collections API.
>
> Here are the list of steps that you'd need to run to see that things are
> fine for you without the read permission:
> * Untar and setup Solr, don't start it yet
> * Start clean zookeeper
> * Put the security.json in zk, without anything other than a security-edit
> permission. Find the content of the file below. Upload it using your own zk
> client or through the solr script:
> > solr-5.3.1/server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:2181
> -cmd putfile /security.json ~/security.json
>
> security.json:
>
> {"authentication":{"class":"solr.BasicAuthPlugin","credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0=
>
> Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}},"authorization":{"class":"solr.RuleBasedAuthorizationPlugin","user-role":{"solr":["admin"]},"permissions":[{"name":"security-edit","role":"admin"}]}}
>
> * Start solr:
> > solr-5.3.1/bin/solr start -e cloud -z localhost:2181
>
> You would need to key in a few things e.g. #nodes and ports, leave them at
> the default values of 2 nodes and 8983/7574, unless you want to run Solr on
> a different port. Then let it create a default collection to just make sure
> that everything works fine.
>
> * Add the collection-admin-edit command:
> > curl --user solr:SolrRocks
> http://localhost:8983/solr/admin/authorization
> -H 'Content-type:application/json' -d '{"set-permission" :
> {"name":"collection-admin-edit", "role":"admin"}}'
>
> At this point, everything should be working fine. Restarting the nodes
>  should also work fine. You can try 2 things at this point:
> 1. Create a new collection with 1 sha

Re: Re:Re: Implementing security.json is breaking ADDREPLICA

2015-11-19 Thread Anshum Gupta
I'll try out what you did later in the day, as soon as I get time but why
exactly are you creating cores manually? Seems like you manually create a
core and the try to add a replica. Can you try using the Collections API to
create a collection?

Starting Solr 5.0, the only supported way to create a new collection is via
the Collections API. Creating a core would lead to a collection creation
but that's not really supported. It was just something that was done when
there were no Collections API.


On Thu, Nov 19, 2015 at 12:36 PM, Oakley, Craig (NIH/NLM/NCBI) [C] <
craig.oak...@nih.gov> wrote:

> I tried again with the following security.json, but the results were the
> same:
>
> {
>   "authentication":{
> "class":"solr.BasicAuthPlugin",
> "credentials":{
>   "solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0=
> Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c=",
>   "solruser":"VgZX1TAMNHT2IJikoGdKtxQdXc+MbNwfqzf89YqcLEE=
> 37pPWQ9v4gciIKHuTmFmN0Rv66rnlMOFEWfEy9qjJfY="},
> "":{"v":9}},
>   "authorization":{
> "class":"solr.RuleBasedAuthorizationPlugin",
> "user-role":{
>   "solr":[
> "admin",
> "read",
> "xmpladmin",
> "xmplgen",
> "xmplsel"],
>   "solruser":[
> "read",
> "xmplgen",
> "xmplsel"]},
> "permissions":[
>   {
> "name":"security-edit",
> "role":"admin"},
>   {
> "name":"xmpl_admin",
> "collection":"xmpl",
> "path":"/admin/*",
> "role":"xmpladmin"},
>   {
> "name":"xmpl_sel",
> "collection":"xmpl",
> "path":"/select/*",
> "role":null},
>   {
>  "name":"all-admin",
>  "collection":null,
>  "path":"/*",
>  "role":"xmplgen"},
>   {
>  "name":"all-core-handlers",
>  "path":"/*",
>  "role":"xmplgen"}],
> "":{"v":42}}}
>
> -Original Message-
> From: Oakley, Craig (NIH/NLM/NCBI) [C]
> Sent: Thursday, November 19, 2015 1:46 PM
> To: 'solr-user@lucene.apache.org' <solr-user@lucene.apache.org>
> Subject: RE: Re:Re: Implementing security.json is breaking ADDREPLICA
>
> I note that the thread called "Security Problems" (most recent post by
> Nobel Paul) seems like it may help with much of what I'm trying to do. I
> will see to what extent that may help.
>



-- 
Anshum Gupta


Re: Large multivalued field and overseer problem

2015-11-19 Thread Anshum Gupta
Hi Olivier,

A few things that you should know:
1. The Overseer is at a per cluster level and not at a per-collection level.
2. Also, documents/fields/etc. should have zero impact on the Overseer
itself.

So, while the upgrade to a more recent Solr version comes with a lot of
good stuff, the cluster state or the Overseer are not what you should be
looking at. Also, failing recovery also has nothing to do with the Overseer.

Now, the problem that might help people here to help you better.

Can you tell something about your zookeeper ? version, #nodes ?

Also, is the network between the Solr nodes and zk fine ?

You mention that you're seeing this issue while indexing. How are you
indexing (CloudSolrClient ? ) and what are your indexing settings
(auto-commit etc.).

Most importantly, what is the heap size of the Solr processes?


On Thu, Nov 19, 2015 at 12:43 PM, Olivier <olivau...@gmail.com> wrote:

> Hi,
>
> We have a Solrcloud cluster with 3 nodes (4 processors, 24 Gb RAM per
> node).
> We have 3 shards per node and the replication factor is 3. We host 3
> collections, the biggest is about 40K documents only.
> The most important thing is a multivalued field with about 200K to 300K
> values per document (each value is a kind of reference product of type
> String).
> We have some very big issues with our SolrCloud cluster. It crashes
> entirely very frequently at the indexation time. It starts with an overseer
> issue :
>
> Session expired de l’overseer : KeeperErrorCode = Session expired for
> /overseer_elect/leader
>
> Then an another node is elected overseer. But the recovery phase seems to
> failed indefinitely. It seems that the communication between the overseer
> and ZK is impossible.
> And after a short period of time, all the cluster is unavailable (out of
> memory JVM error). And we have to restart it.
>
> So I wanted to know if we can continue to use huge multivalued field with
> SolrCloud.
> We are on Solr 4.10.4 for now, do you think that if we upgrade to Solr 5,
> with an overseer per collection it can fix our issues ?
> Or do we have to rethink the schema to avoid this very large multivalued
> field ?
>
> Thanks,
> Best,
>
> Olivier
>



-- 
Anshum Gupta


Re: Implementing security.json is breaking ADDREPLICA

2015-11-18 Thread Anshum Gupta
28)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:234)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:226)
> at
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
> at
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:152)
> at
> org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:207)
> at
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:147)
> at
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437)
> at
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)
>
> INFO  - 2015-11-17 21:03:54.166; [c:xmpl s:shard1 r:core_node2
> x:xmpl_shard1_replica1] org.apache.solr.update.UpdateLog; Dropping buffered
> updates FSUpdateLog{state=BUFFERING, tlog=null}
> ERROR - 2015-11-17 21:03:54.166; [c:xmpl s:shard1 r:core_node2
> x:xmpl_shard1_replica1] org.apache.solr.cloud.RecoveryStrategy; Recovery
> failed - trying again... (2)
> INFO  - 2015-11-17 21:03:54.166; [c:xmpl s:shard1 r:core_node2
> x:xmpl_shard1_replica1] org.apache.solr.cloud.RecoveryStrategy; Wait 8.0
> seconds before trying to recover again (3)
>
>
>
> And (after modifying Logging Levels), the solr.log of the node which
> already had a core gets errors such as the following:
>
> 2015-11-17 21:03:50.743 DEBUG (qtp59559151-87) [   ] o.e.j.s.Server
> REQUEST GET /solr/tpl/cloud.html on HttpChannelOverHttp@37cf94f4
> {r=1,c=false,a=DISPATCHED,uri=/solr/tpl/cloud.html}
> 2015-11-17 21:03:50.744 DEBUG (qtp59559151-87) [   ] o.e.j.s.Server
> RESPONSE /solr/tpl/cloud.html  200 handled=true
> 2015-11-17 21:03:50.802 DEBUG (qtp59559151-91) [   ] o.e.j.s.Server
> REQUEST GET /solr/zookeeper on HttpChannelOverHttp@37cf94f4
> {r=2,c=false,a=DISPATCHED,uri=/solr/zookeeper}
> 2015-11-17 21:03:50.803 INFO  (qtp59559151-91) [   ] o.a.s.s.HttpSolrCall
> userPrincipal: [null] type: [UNKNOWN], collections: [], Path: [/zookeeper]
> 2015-11-17 21:03:50.831 DEBUG (qtp59559151-91) [   ] o.e.j.s.Server
> RESPONSE /solr/zookeeper  200 handled=true
> 2015-11-17 21:03:50.837 DEBUG (qtp59559151-87) [   ] o.e.j.s.Server
> REQUEST GET /solr/zookeeper on HttpChannelOverHttp@37cf94f4
> {r=3,c=false,a=DISPATCHED,uri=/solr/zookeeper}
> 2015-11-17 21:03:50.838 INFO  (qtp59559151-87) [   ] o.a.s.s.HttpSolrCall
> userPrincipal: [null] type: [UNKNOWN], collections: [], Path: [/zookeeper]
> 2015-11-17 21:03:50.841 DEBUG (qtp59559151-87) [   ] o.e.j.s.Server
> RESPONSE /solr/zookeeper  200 handled=true
> 2015-11-17 21:03:50.857 DEBUG (qtp59559151-91) [   ] o.e.j.s.Server
> REQUEST GET /solr/zookeeper on HttpChannelOverHttp@37cf94f4
> {r=4,c=false,a=DISPATCHED,uri=/solr/zookeeper}
> 2015-11-17 21:03:50.858 INFO  (qtp59559151-91) [   ] o.a.s.s.HttpSolrCall
> userPrincipal: [null] type: [UNKNOWN], collections: [], Path: [/zookeeper]
> 2015-11-17 21:03:50.860 DEBUG (qtp59559151-91) [   ] o.e.j.s.Server
> RESPONSE /solr/zookeeper  200 handled=true
> 2015-11-17 21:03:54.162 DEBUG (qtp59559151-87) [   ] o.e.j.s.Server
> REQUEST POST /solr/xmpl/update on HttpChannelOverHttp@1cf967f0
> {r=1,c=false,a=DISPATCHED,uri=/solr/xmpl/update}
> 2015-11-17 21:03:54.164 INFO  (qtp59559151-87) [c:xmpl s:shard1
> r:core_node1 x:xmpl] o.a.s.s.HttpSolrCall userPrincipal: [null] type:
> [WRITE], collections: [xmpl,], Path: [/update]
> 2015-11-17 21:03:54.164 DEBUG (qtp59559151-87) [c:xmpl s:shard1
> r:core_node1 x:xmpl] o.e.j.s.Server RESPONSE /solr/xmpl/update  401
> handled=true
>
>
>
> My impression from Anshum Gupta's 10/16/15 talk in Austin at the Solr
> conference was that this was supposed to work. It does seem that one might
> be able to add security to replication, but there does not seem to be a way
> to add SolrCloud replication to this type of security.
>
> Also, on a side note, I notice that http://hostname:port/solr/ does bring
> up the GUI without prompting for a password: the Security team here would
> like us to implement security.json in such a way that even the front page
> of the GUI will require a password (although they will allow us to allow
> select access without a password): I have not yet found a way via
> security.json to implement that a password would be required in order to
> access the GUI front page.
>
>
>
> Please advise.
>
>


-- 
Anshum Gupta


Re: DocValues error

2015-11-13 Thread Anshum Gupta
Hi Devansh,

Yes you'd need to reindex your data in order to use DocValues. It's
highlighted here @ the official ref guide :

https://cwiki.apache.org/confluence/display/solr/DocValues

On Fri, Nov 13, 2015 at 10:00 AM, Dhutia, Devansh <ddhu...@gannett.com>
wrote:

> We have an existing collection with a field called lastpublishdate of type
> tdate. It already has a lot of data indexed, and we want to add docValues
> to improve our sorting performance on the field.
>
> The old field definition was:
>
>  
>
> We we recently changed it to
>
>   docValues="true"/>
>
> Is that considered a breaking change? Upon deploying the schema &
> reloading the collection, sorting on the field fails the following error:
>
> unexpected docvalues type NONE for field 'lastpublishdate'
> (expected=NUMERIC). Use UninvertingReader or index with docvalues.
>
> Do we really need to wipe & rebuild the entire index to add docValues to
> an existing dataset?
>
> Thanks
>



-- 
Anshum Gupta


Re: Solr Search: Access Control / Role based security

2015-11-10 Thread Anshum Gupta
I think both of those overlap at some point but aren't really directly
related or problems that would be solved in the same manner.

Document level security, though can be implemented using custom
authentication/authorization plugins, but there are a fair amount of users
who use ManifoldCF for the same. So it's totally your pick.

I'm not 100% sure, but I think using a custom authentication/authorization
plugin + an update request processor is more work than using ManifoldCF for
that purpose.

On Tue, Nov 10, 2015 at 10:37 AM, Susheel Kumar <susheel2...@gmail.com>
wrote:

> Thanks everyone for the suggestions.
>
> Hi Noble - Were there any thoughts made on utilizing Apache ManifoldCF
> while developing Authentication/Authorization plugins or anything to add
> there.
>
> Thanks,
> Susheel
>
> On Tue, Nov 10, 2015 at 5:01 AM, Alessandro Benedetti <
> abenede...@apache.org
> > wrote:
>
> > I've been working for a while with Apache ManifoldCF and Enterprise
> Search
> > in Solr ( with Document level security) .
> > Basically you can add a couple of extra fields , for example :
> >
> > allow_token : containing all the tokens that can view the document
> > deny_token : containing all the tokens that are denied to view the
> document
> >
> > Apache ManifoldCF provides an integration that add an additional layer,
> and
> > is able to combine different data sources permission schemes.
> > The Authority Service endpoint will take in input the user name and
> return
> > all the allow_token values and deny_token.
> > At this point you can append the related filter queries to your queries
> and
> > be sure that the user will only see what is supposed to see.
> >
> > It's basically an extension of the strategy you were proposing, role
> based.
> > Of course keep protected your endpoints and avoid users to put custom fq,
> > or all your document security model would be useless :)
> >
> > Cheers
> >
> >
> > On 9 November 2015 at 21:52, Scott Stults <
> > sstu...@opensourceconnections.com
> > > wrote:
> >
> > > Susheel,
> > >
> > > This is perfectly fine for simple use-cases and has the benefit that
> the
> > > filterCache will help things stay nice and speedy. Apache ManifoldCF
> > goes a
> > > bit further and ties back to your authentication and authorization
> > > mechanism:
> > >
> > >
> > >
> >
> http://manifoldcf.apache.org/release/trunk/en_US/concepts.html#ManifoldCF+security+model
> > >
> > >
> > > k/r,
> > > Scott
> > >
> > > On Thu, Nov 5, 2015 at 2:26 PM, Susheel Kumar <susheel2...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I have seen couple of use cases / need where we want to restrict
> result
> > > of
> > > > search based on role of a user.  For e.g.
> > > >
> > > > - if user role is admin, any document from the search result will be
> > > > returned
> > > > - if user role is manager, only documents intended for managers will
> be
> > > > returned
> > > > - if user role is worker, only documents intended for workers will be
> > > > returned
> > > >
> > > > Typical practise is to tag the documents with the roles (using a
> > > > multi-valued field) during indexing and then during search append
> > filter
> > > > query to restrict result based on roles.
> > > >
> > > > Wondering if there is any other better way out there and if this
> common
> > > > requirement should be added as a Solr feature/plugin.
> > > >
> > > > The current security plugins are more towards making Solr
> > apis/resources
> > > > secure not towards securing/controlling data during search.
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Authentication+and+Authorization+Plugins
> > > >
> > > >
> > > > Please share your thoughts.
> > > >
> > > > Thanks,
> > > > Susheel
> > > >
> > >
> > >
> > >
> > > --
> > > Scott Stults | Founder & Solutions Architect | OpenSource Connections,
> > LLC
> > > | 434.409.2780
> > > http://www.opensourceconnections.com
> > >
> >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>



-- 
Anshum Gupta


Re: Security Problems

2015-11-10 Thread Anshum Gupta
The reason why we bypass that is so that we don't hit the authentication
plugin for every request that comes in for static content. I think we could
call the authentication plugin for that but that'd be an overkill. Better
experience ? yes

On Tue, Nov 10, 2015 at 11:24 AM, Upayavira <u...@odoko.co.uk> wrote:

> Noble,
>
> I get that a UI which is open source does not benefit from ACL control -
> we're not giving away anything that isn't public (other than perhaps
> info that could be used to identify the version of Solr, or even the
> fact that it *is* solr).
>
> However, from a user experience point of view, requiring credentials to
> see the UI would be more conventional, and therefore lead to less
> confusion. Is it possible for us to protect the UI static files, only
> for the sake of user experience, rather than security?
>
> Upayavira
>
> On Tue, Nov 10, 2015, at 12:01 PM, Noble Paul wrote:
> > The admin UI is a bunch of static pages . We don't let the ACL control
> > static content
> >
> > you must blacklist all the core/collection apis and it is pretty much
> > useless for anyone to access the admin UI (w/o the credentials , of
> > course)
> >
> > On Tue, Nov 10, 2015 at 7:08 AM, 马柏樟 <mabaizh...@126.com> wrote:
> > > Hi,
> > >
> > > After I configure Authentication with Basic Authentication Plugin and
> Authorization with Rule-Based Authorization Plugin, How can I prevent the
> strangers from visiting my solr by browser? For example, if the stranger
> visit the http://(my host):8983, the browser will pop up a window and
> says "the server http://(my host):8983 requires a username and
> password"
> >
> >
> >
> > --
> > -
> > Noble Paul
>



-- 
Anshum Gupta


Re: Security Problems

2015-11-10 Thread Anshum Gupta
It has a cost :)

I think it'd make sense to restrict access to /admin and not really bother
about .css/js etc. So if a user tries to access an image from the image
from the admin UI directly, the request would go through but that should be
fine.

On Tue, Nov 10, 2015 at 12:22 PM, Upayavira <u...@odoko.co.uk> wrote:

> Is the authentication plugin that expensive?
>
> I can help by minifying the UI down to a smaller number of CSS/JS/etc
> files :-)
>
> It may be overkill, but it would also give better experience. And isn't
> that what most applications do? Check authentication tokens on every
> request?
>
> Upayavira
>
> On Tue, Nov 10, 2015, at 07:33 PM, Anshum Gupta wrote:
> > The reason why we bypass that is so that we don't hit the authentication
> > plugin for every request that comes in for static content. I think we
> > could
> > call the authentication plugin for that but that'd be an overkill. Better
> > experience ? yes
> >
> > On Tue, Nov 10, 2015 at 11:24 AM, Upayavira <u...@odoko.co.uk> wrote:
> >
> > > Noble,
> > >
> > > I get that a UI which is open source does not benefit from ACL control
> -
> > > we're not giving away anything that isn't public (other than perhaps
> > > info that could be used to identify the version of Solr, or even the
> > > fact that it *is* solr).
> > >
> > > However, from a user experience point of view, requiring credentials to
> > > see the UI would be more conventional, and therefore lead to less
> > > confusion. Is it possible for us to protect the UI static files, only
> > > for the sake of user experience, rather than security?
> > >
> > > Upayavira
> > >
> > > On Tue, Nov 10, 2015, at 12:01 PM, Noble Paul wrote:
> > > > The admin UI is a bunch of static pages . We don't let the ACL
> control
> > > > static content
> > > >
> > > > you must blacklist all the core/collection apis and it is pretty much
> > > > useless for anyone to access the admin UI (w/o the credentials , of
> > > > course)
> > > >
> > > > On Tue, Nov 10, 2015 at 7:08 AM, 马柏樟 <mabaizh...@126.com> wrote:
> > > > > Hi,
> > > > >
> > > > > After I configure Authentication with Basic Authentication Plugin
> and
> > > Authorization with Rule-Based Authorization Plugin, How can I prevent
> the
> > > strangers from visiting my solr by browser? For example, if the
> stranger
> > > visit the http://(my host):8983, the browser will pop up a window and
> > > says "the server http://(my host):8983 requires a username and
> > > password"
> > > >
> > > >
> > > >
> > > > --
> > > > -
> > > > Noble Paul
> > >
> >
> >
> >
> > --
> > Anshum Gupta
>



-- 
Anshum Gupta


Re: SolrCloud Startup question

2015-09-21 Thread Anshum Gupta
CloudSolrClient is thread safe and it is highly recommended you reuse the
client.

If you are providing an HttpClient instance while constructing, make sure
that the HttpClient uses a multi-threaded connection manager.

On Mon, Sep 21, 2015 at 3:13 PM, Ravi Solr <ravis...@gmail.com> wrote:

> Thank you Anshum & Upayavira.
>
> BTW do any of you guys know if CloudSolrClient is ThreadSafe ??
>
> Thanks,
>
> Ravi Kiran Bhaskar
>
> On Monday, September 21, 2015, Anshum Gupta <ans...@anshumgupta.net>
> wrote:
>
> > Hi Ravi,
> >
> > I just tried it out and here's my understanding:
> >
> > 1. Starting Solr with -c starts Solr in cloud mode. This is used to start
> > Solr with an embedded zookeeper.
> > 2. Starting Solr with -z starts Solr in cloud mode, with the zk
> connection
> > string you specify. You don't need to explicitly specify -c in this case.
> > The help text there needs a bit of fixing though
> >
> > *  -zZooKeeper connection string; only used when running in
> > SolrCloud mode using -c*
> > *   To launch an embedded ZooKeeper instance, don't pass
> > this parameter.*
> >
> > *"only used when running in SolrCloud mode using -c" *needs to be
> rephrased
> > or removed. Can you create a JIRA for the same?
> >
> >
> > On Mon, Sep 21, 2015 at 1:35 PM, Ravi Solr <ravis...@gmail.com
> > <javascript:;>> wrote:
> >
> > > Can somebody kindly help me understand the difference between the
> > following
> > > startup calls ?
> > >
> > > ./solr start -p  -s /solr/home -z zk1:2181,zk2:2181,zk3:2181
> > >
> > > Vs
> > >
> > > ./solr start -c -p  -s /solr/home -z zk1:2181,zk2:2181,zk3:2181
> > >
> > > What happens if i don't pass the "-c" option ?? I read the
> documentation
> > > but got more confused, I do run a ZK ensemble of 3 instances.  FYI my
> > cloud
> > > seems to work fine and teh Admin UI shows Cloud graph just fine, but I
> > want
> > > to just make sure I am doing the right thing and not missing any
> nuance.
> > >
> > > The following is form documention on cwiki.
> > > ---
> > >
> > > "Start Solr in SolrCloud mode, which will also launch the embedded
> > > ZooKeeper instance included with Solr.
> > >
> > > This option can be shortened to simply -c.
> > >
> > > If you are already running a ZooKeeper ensemble that you want to use
> > > instead of the embedded (single-node) ZooKeeper, you should also pass
> the
> > > -z parameter."
> > >
> > > -
> > >
> > > Thanks
> > >
> > > Ravi Kiran Bhaskar
> > >
> >
> >
> >
> > --
> > Anshum Gupta
> >
>



-- 
Anshum Gupta


Re: SolrCloud Startup question

2015-09-21 Thread Anshum Gupta
Hi Ravi,

I just tried it out and here's my understanding:

1. Starting Solr with -c starts Solr in cloud mode. This is used to start
Solr with an embedded zookeeper.
2. Starting Solr with -z starts Solr in cloud mode, with the zk connection
string you specify. You don't need to explicitly specify -c in this case.
The help text there needs a bit of fixing though

*  -zZooKeeper connection string; only used when running in
SolrCloud mode using -c*
*   To launch an embedded ZooKeeper instance, don't pass
this parameter.*

*"only used when running in SolrCloud mode using -c" *needs to be rephrased
or removed. Can you create a JIRA for the same?


On Mon, Sep 21, 2015 at 1:35 PM, Ravi Solr <ravis...@gmail.com> wrote:

> Can somebody kindly help me understand the difference between the following
> startup calls ?
>
> ./solr start -p  -s /solr/home -z zk1:2181,zk2:2181,zk3:2181
>
> Vs
>
> ./solr start -c -p  -s /solr/home -z zk1:2181,zk2:2181,zk3:2181
>
> What happens if i don't pass the "-c" option ?? I read the documentation
> but got more confused, I do run a ZK ensemble of 3 instances.  FYI my cloud
> seems to work fine and teh Admin UI shows Cloud graph just fine, but I want
> to just make sure I am doing the right thing and not missing any nuance.
>
> The following is form documention on cwiki.
> ---
>
> "Start Solr in SolrCloud mode, which will also launch the embedded
> ZooKeeper instance included with Solr.
>
> This option can be shortened to simply -c.
>
> If you are already running a ZooKeeper ensemble that you want to use
> instead of the embedded (single-node) ZooKeeper, you should also pass the
> -z parameter."
>
> -
>
> Thanks
>
> Ravi Kiran Bhaskar
>



-- 
Anshum Gupta


Re: Securing solr 5.2 basic auth permission rules

2015-09-16 Thread Anshum Gupta
Basic authentication (and the API support, that you're trying to use) was
only released with 5.3.0 so it wouldn't work with 5.2.
5.2 only had the authentication and authorization frameworks, and shipped
with Kerberos authentication plugin out of the box.

There are a few known issues with that though, and a 5.3.1 release is just
around the corner.

On Wed, Sep 16, 2015 at 10:11 AM, Aziz Gaou <gaoua...@gmail.com> wrote:

> Hi,
>
> I try to follow:
>
> https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin
> ,
> to protect Solr 5.2 Admin with password, but I have not been able to
> secure.
>
> 1) When I run the following command:
>
> curl --user solr:SolrRocks http://localhost:8983/solr/admin/authentication
> -H 'Content-type:application/json'-d '{
>   "set-user": {"tom" : "TomIsCool" }}'
>
> no update on the file security.json
>
> 2) I launched the following 2 commands:
>
> curl --user solr:SolrRocks http://localhost:8983/solr/admin/authorization
> -H 'Content-type:application/json'-d '{"set-permission": {
> "name":"updates", "collection":"MyCollection", "role": "dev"}}'
>
> curl --user solr:SolrRocks http://localhost:8983/solr/admin/authorization
> -H 'Content-type:application/json' -d '{ "set-user-role": {"tom":["dev"}}'
>
> always MyCollection is not protected.
>
>
> thank you for your help.
>



-- 
Anshum Gupta


Re: SolrJ CollectionAdminRequest.Reload fails

2015-09-11 Thread Anshum Gupta
This certainly can be fixed. Can you create a JIRA for the same? There
might be other calls which might need fixing on similar lines.

On Fri, Sep 11, 2015 at 2:32 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 9/11/2015 3:12 PM, Hendrik Haddorp wrote:
> > I'm using Solr 5.3.0 and noticed that the following code does not work
> > with Solr Cloud:
> > CollectionAdminRequest.Reload reloadReq = new
> > CollectionAdminRequest.Reload();
> > reloadReq.process(client, collection);
> >
> > It complains that the name parameter is required. When adding
> > reloadReq.setCollectionName(collection);
> > it works. But why would I need to specify the collection name twice?
>
> This might be an oversight in the code, or perhaps it is a situation
> where it won't be possible to handle the collection as a method argument.
>
> Can you give us the full error message (including any stacktrace) so I
> can look into it later this evening?
>
> Thanks,
> Shawn
>
>


-- 
Anshum Gupta


Re: How to secure Admin UI with Basic Auth in Solr 5.3.x

2015-09-11 Thread Anshum Gupta
Hi Merlin,

Solr 5.2.x only supported Kerberos out of the box and introduced a
framework to write your own authentication/authorization plugin. If you
don't use Kerberos, the only sensible way forward for you would be to wait
for the 5.3.1 release to come out and then move to it.

Until then, or without the upgrade, your best bet would be to try what
Davis suggested.

On Fri, Sep 11, 2015 at 7:30 AM, Merlin Morgenstern <
merlin.morgenst...@gmail.com> wrote:

> Thank you for the info.
>
> I have already downgraded to 5.2.x as this is a production setup.
> Unfortunatelly I have the same trouble there ... Any suggestions how to fix
> this? What is the recommended procedure in securing the admin gui on prod
> setups?
>
> 2015-09-11 14:26 GMT+02:00 Noble Paul <noble.p...@gmail.com>:
>
> > There were some bugs with the 5.3.0 release and 5.3.1 is in the
> > process of getting released.
> >
> > try out the option #2 with the RC here
> >
> >
> >
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.3.1-RC1-rev1702389/solr/
> >
> >
> >
> > On Fri, Sep 11, 2015 at 5:16 PM, Merlin Morgenstern
> > <merlin.morgenst...@gmail.com> wrote:
> > > OK, I downgraded to solr 5.2.x
> > >
> > > Unfortunatelly still no luck. I followed 2 aproaches:
> > >
> > > 1. Secure it the old fashioned way like described here:
> > >
> >
> http://stackoverflow.com/questions/28043957/how-to-set-apache-solr-admin-password
> > >
> > > 2. Using the Basic Authentication Plugin like described here:
> > > http://lucidworks.com/blog/securing-solr-basic-auth-permission-rules/
> > >
> > > Both aproaches created unsolved problems.
> > >
> > > While following option 1, I was able to secure the Admin UI with basic
> > > authentication, but no longer able to access my application despite the
> > > fact that it was working on solr 3.x with the same type of
> authentication
> > > procedure and credentials.
> > >
> > > While following option 2, I was stuck right after uploading the
> > > security.json file to the zookeeper ensemble. The described behaviour
> to
> > curl
> > > http://localhost:8983/solr/admin/authentication responded with a 404
> not
> > > found and then solr could not connect to zookeeper. I had to remove
> that
> > > file from zookeeper and restart all solr nodes.
> > >
> > > Please could someone lead me the way on how to secure the Admin UI and
> > > password protect solr cloud? I have a perfectly running system with
> solr
> > > 3.x and one core and now taking it to solr cloud 5.2.x into production
> > > seems to be stoped by simple authorization problems.
> > >
> > > Thank you in advane for any help.
> > >
> > >
> > >
> > > 2015-09-10 20:42 GMT+02:00 Noble Paul <noble.p...@gmail.com>:
> > >
> > >> Check this
> > https://cwiki.apache.org/confluence/display/solr/Securing+Solr
> > >>
> > >> There a couple of bugs in 5.3.o and a bug fix release is coming up
> > >> over the next few days.
> > >>
> > >> We don't provide any specific means to restrict access to admin UI
> > >> itself. However we let users specify fine grained ACLs on various
> > >> operations such collection-admin-edit, read etc
> > >>
> > >> On Wed, Sep 9, 2015 at 2:35 PM, Merlin Morgenstern
> > >> <merlin.morgenst...@gmail.com> wrote:
> > >> > I just installed solr cloud 5.3.x and found that the way to secure
> the
> > >> amin
> > >> > ui has changed. Aparently there is a new plugin which does role
> based
> > >> > authentification and all info on how to secure the admin UI found on
> > the
> > >> > net is outdated.
> > >> >
> > >> > I do not need role based authentification but just simply want to
> put
> > >> basic
> > >> > authentification to the Admin UI.
> > >> >
> > >> > How do I configure solr cloud 5.3.x in order to restrict access to
> the
> > >> > Admin UI via Basic Authentification?
> > >> >
> > >> > Thank you for any help
> > >>
> > >>
> > >>
> > >> --
> > >> -
> > >> Noble Paul
> > >>
> >
> >
> >
> > --
> > -
> > Noble Paul
> >
>



-- 
Anshum Gupta


Re: Hash of solr documents

2015-08-26 Thread Anshum Gupta
Hi David,

The route key itself is indexed, but not the hash value. Why do you need to
know and display the hash value? This seems like an XY problem to me:
http://people.apache.org/~hossman/#xyproblem

On Wed, Aug 26, 2015 at 1:17 AM, david.dav...@correo.aeat.es wrote:

 Hi,

 I have read in one post in the Internet that the hash Solr Cloud
 calculates over the key field to send each document to a different shard
 is indexed. Is this true? If true, is there any way to show this hash for
 each document?

 Thanks,

 David




-- 
Anshum Gupta


Re: splitting shards on 4.7.2 with custom plugins

2015-08-25 Thread Anshum Gupta
Can you elaborate a bit more on the setup, what do the custom plugins do,
what error do you get ? It seems like a classloader/classpath issue to me
which doesn't really relate to Shard splitting.


On Tue, Aug 25, 2015 at 7:59 PM, Jeff Courtade courtadej...@gmail.com
wrote:

 I am getting failures when trying too split shards on solr 4.2.7 with
 custom plugins.

 It fails regularily it cannot find the jar files for  plugins when creating
 the new cores/shards.

 Ideas?

 --
 Thanks,

 Jeff Courtade
 M: 240.507.6116




-- 
Anshum Gupta


Re: Core mismatch in org.apache.solr.update.StreamingSolrClients Errors for ConcurrentUpdateSolrClient

2015-08-11 Thread Anshum Gupta
How did you create your collections? Also, is that verbatim from the logs
or is it just because you obfuscated that part while posting it here?

On Mon, Aug 10, 2015 at 11:02 PM, deniz denizdurmu...@gmail.com wrote:

 Hello Anshum,

 thanks for the quick reply

 I know it is being forwarded one node to the leader node, but for
 collection
 names, it shows different collections while master node address is correct.

 Dunno if I am missing some points but my concern is the bold parts below:

 ERROR - 2015-08-11 05:04:34.592; [*CoreA* shard1 core_node2 *CoreA*]
 org.apache.solr.update.StreamingSolrClients$1; error
 org.apache.solr.common.SolrException: Bad Request
 request:

 http://server:8983/solr/*CoreB*/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2Fserver2%3A8983%2Fsolr%2F*CoreB*%2Fwt=javabinversion=2

 So this is also normal?


 Anshum Gupta wrote
  Hi Deniz,
 
  Seems like the update that's being forwarded from a non-leader (original
  node that received the request) is failing. This could be due to multiple
  reasons, including issue with your schema vs document that you sent.
 
  To elaborate more, here's how a typical batched request in SolrCloud
  works.
 
  1. Batch sent from client.
  2. Received by node X.
  3. All documents that have their shard leader on node X, are processed
 and
  distributed to the replicas by node X. All other documents which belong
 to
  a shard who's leader isn't on Node X, get forwarded using the
  ConcurrentUpdateSolrClient to their respective leaders.
 
  There's nothing *strange* about this log, other than the fact that the
  update failed (and would have failed even if you would have directly sent
  the document to this node). Hope this made things clear.
 
  --
  Anshum Gupta





 -
 Zeki ama calismiyor... Calissa yapar...
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Core-mismatch-in-org-apache-solr-update-StreamingSolrClients-Errors-for-ConcurrentUpdateSolrClient-tp4222335p4222338.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Anshum Gupta


Re: Core mismatch in org.apache.solr.update.StreamingSolrClients Errors for ConcurrentUpdateSolrClient

2015-08-11 Thread Anshum Gupta
bq. adding it on admin interface of solr

Did you not use Collections Admin API? If you try to create your own cores
using the core admin APIs instead of using Collection Admin APIs, you could
really end up shooting yourself in your feet. Also, the only supported
mechanism to create a collection in Solr is via the Collection APIs.

On Mon, Aug 10, 2015 at 11:13 PM, deniz denizdurmu...@gmail.com wrote:

 I have created by simply creating configs and then using upconfig to upload
 to zookeeper, then adding it on admin interface of solr.

 I have only changed the ips of server and server1 and changed the
 core/collection names to CoreA and CoreB, in the logs CoreA and CoreB are
 different collections with different names.



 -
 Zeki ama calismiyor... Calissa yapar...
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Core-mismatch-in-org-apache-solr-update-StreamingSolrClients-Errors-for-ConcurrentUpdateSolrClient-tp4222335p4222341.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Anshum Gupta


Re: Core mismatch in org.apache.solr.update.StreamingSolrClients Errors for ConcurrentUpdateSolrClient

2015-08-11 Thread Anshum Gupta
It's not entirely invalid but the only supported mechanism to create
collections is via the Collections admin API:

https://cwiki.apache.org/confluence/display/solr/Collections+API



On Mon, Aug 10, 2015 at 11:53 PM, deniz denizdurmu...@gmail.com wrote:

 okay, to make everything clear, here are the steps:

 - Creating configs etc and then running:

 ./zkcli.sh -cmd upconfig -n CoreA -d /path/to/core/configs/CoreA/conf/ -z
 zk1:2181,zk2:2182,zk3:2183

 - Then going to http://someserver:8983/solr/#/~cores

 - Clicking Add Core:
 
 http://lucene.472066.n3.nabble.com/file/n4222345/Screen_Shot_2015-08-11_at_14.png
 

 Repateding the last step on other node as well

 So this is invalid (incl https://wiki.apache.org/solr/CoreAdmin)?



 -
 Zeki ama calismiyor... Calissa yapar...
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Core-mismatch-in-org-apache-solr-update-StreamingSolrClients-Errors-for-ConcurrentUpdateSolrClient-tp4222335p4222345.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Anshum Gupta


Re: Core mismatch in org.apache.solr.update.StreamingSolrClients Errors for ConcurrentUpdateSolrClient

2015-08-10 Thread Anshum Gupta
Hi Deniz,

Seems like the update that's being forwarded from a non-leader (original
node that received the request) is failing. This could be due to multiple
reasons, including issue with your schema vs document that you sent.

To elaborate more, here's how a typical batched request in SolrCloud works.

1. Batch sent from client.
2. Received by node X.
3. All documents that have their shard leader on node X, are processed and
distributed to the replicas by node X. All other documents which belong to
a shard who's leader isn't on Node X, get forwarded using the
ConcurrentUpdateSolrClient to their respective leaders.

There's nothing *strange* about this log, other than the fact that the
update failed (and would have failed even if you would have directly sent
the document to this node). Hope this made things clear.

On Mon, Aug 10, 2015 at 10:45 PM, deniz denizdurmu...@gmail.com wrote:

 I have a simple 2-node(5.1) cloud env with 6 different cores. One of the
 cores(CoreB) has some update issue which I am aware of, but the error logs
 on solr, I am seeing these below:

 ERROR - 2015-08-11 05:04:34.592; [*CoreA shard1 core_node2 CoreA*]
 org.apache.solr.update.StreamingSolrClients$1; error
 org.apache.solr.common.SolrException: Bad Request
 request:
 *
 http://server:8983/solr/CoreB*/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2Fserver2%3A8983%2Fsolr%2FCoreB%2Fwt=javabinversion=2
 at

 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241)
 at

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)

 ERROR - 2015-08-11 05:09:30.260; [CoreA shard1 core_node2 CoreA]
 org.apache.solr.update.StreamingSolrClients$1; error
 org.apache.solr.common.SolrException: Bad Request
 request:

 http://server:8983/solr/CoreB/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2Fserver2%3A8983%2Fsolr%2FCoreB%2Fwt=javabinversion=2
 at

 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241)
 at

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)

 ERROR - 2015-08-11 05:20:49.710; [gaysuser shard1 core_node2 gaysuser]
 org.apache.solr.update.StreamingSolrClients$1; error
 org.apache.solr.common.SolrException: Bad Request
 request:

 http://server:8983/solr/CoreB/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2Fserver2%3A8983%2Fsolr%2FCoreB%2Fwt=javabinversion=2
 at

 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241)
 at

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)

 ERROR - 2015-08-11 05:23:29.868; [CoreA shard1 core_node2 CoreA]
 org.apache.solr.update.StreamingSolrClients$1; error
 org.apache.solr.common.SolrException: Bad Request
 request:

 http://server:8983/solr/CoreB/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2Fserver2%3A8983%2Fsolr%2FCoreB%2Fwt=javabinversion=2
 at

 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241)
 at

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)

 is this normal and just an issue with the wrong logging params or there is
 something wrong with the configs of the cloud



 -
 Zeki ama calismiyor... Calissa yapar...
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Core-mismatch-in-org-apache-solr-update-StreamingSolrClients-Errors-for-ConcurrentUpdateSolrClient-tp4222335.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Anshum Gupta


Re: Programmatically find out if node is overseer

2015-07-17 Thread Anshum Gupta
As Shai mentioned, OVERSEERSTATUS is the most straight forward and
recommended way to go. It basically does what Erick suggested i.e. get the
first entry from '/overseer_elect/leader' in zk.

Also, ideally, there shouldn't be a point where you have multiple active
Overseers in a single cluster.

On Thu, Jul 16, 2015 at 9:36 PM, Shai Erera ser...@gmail.com wrote:

 An easier way (IMO) and more 'official' is to use the CLUSTERSTATUS (

 https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api18
 )
 or OVERSEERSTATUS (

 https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api17
 )
 API.

 The OVERSEERSTATUS returns a 'leader' item which says who is the overseer,
 at least as far as I understand. Not sure what is returned in case there
 are multiple nodes with the overseer role.

 The CLUSTERSTATUS returns an 'overseer' item with all nodes that have the
 overseer role assigned. I'm usually using that API to query for the status
 of my Solr cluster.

 Shai

 On Fri, Jul 17, 2015 at 3:55 AM, Erick Erickson erickerick...@gmail.com
 wrote:

  look at the overseer election ephemeral node in ZK, the first one in
  line is the current overseer.
 
  Best,
  Erick
 
  On Thu, Jul 16, 2015 at 3:42 AM, Markus Jelsma
  markus.jel...@openindex.io wrote:
   Hello - i need to run a thread on a single instance of a cloud so need
  to find out if current node is the overseer. I know we can already
  programmatically find out if this replica is the leader of a shard via
  isLeader(). I have looked everywhere but i cannot find an isOverseer. I
 did
  find the election stuff but i am unsure if that is what i need to use.
  
   Any thoughts?
  
   Thanks!
   Markus
 




-- 
Anshum Gupta


Re: Programmatically find out if node is overseer

2015-07-17 Thread Anshum Gupta
It shouldn't happen unless you're using an older version of Solr ( 4.8) in
which case, you might end up hitting SOLR-5859
https://issues.apache.org/jira/browse/SOLR-5859.

On Fri, Jul 17, 2015 at 11:29 AM, solr.user.1...@gmail.com wrote:

 Hi Anshum what do you mean by:
 ideally, there shouldn't be a point where you have multiple active
 Overseers in a single cluster

 How can multiple Overseers happen? And what are the consequences?

 Regards

  On 17 Jul 2015, at 19:37, Anshum Gupta ans...@anshumgupta.net wrote:
 
  ideally, there shouldn't be a point where you have multiple active
  Overseers in a single cluster




-- 
Anshum Gupta


Re: Solr standalone + SSL and basic auth

2015-06-22 Thread Anshum Gupta
Hi,

Can you provide with more context? Solr doesn't officially support the
'war' (Web application ARchive) any more.

What version of Solr is this? What are you trying to accomplish? Also, the
patches on SOLR-4460 are from over an year ago.

On Mon, Jun 22, 2015 at 5:22 AM, Fadi Mohsen fadi.moh...@gmail.com wrote:

 Create collection :


 /solr/admin/collections?action=CREATEname=${collectionName}numShards=5replicationFactor=3maxShardsPerNode=3

 On Mon, Jun 22, 2015 at 12:56 PM, Fadi Mohsen fadi.moh...@gmail.com
 wrote:

  Hi, I managed wiring up jetty and Solr war programmatically.
 
  After seeing SOLR-4470 (issues with inter cluster/node client calls), we
  now set:
  HttpClientUtil.setConfigurer(new MyCustomHttpClientConfigurer());
  to setup clients before doing any inter node calls.
 
  also combined with:
  jettywebapp.setParentLoaderPriority(true)
  which means that application and war uses same classpath.
 
  all good so far, uploading configuration and creating collections works.
  But, when quering the collection, responses numFound vary for each
  response.
  guessing that something is preventing solr from collecting proper answer
  from collection (all shards).
 
  We see these warnings in Solr logs:
 
  INFO  qtp678433396-57 update.PeerSync [Solr_335] [] PeerSync:
  core=test_o_txs_shard1_replica2 url=https://host1:9232/solr START
  replicas=[https://host3:9232/solr/test_o_txs_shard1_replica3/]
  nUpdates=100
  INFO  qtp678433396-57 update.PeerSync [Solr_335] [] PeerSync:
  core=test_o_txs_shard1_replica2 url=https://host1:9232/solr DONE.  We
  have no versions.  sync failed.
  INFO  RecoveryThread-test_o_txs_shard4_replica3 cloud.RecoveryStrategy
  [Solr_335] [] Attempting to PeerSync from
  https://host2:9232/solr/test_o_txs_shard4_replica1/
  core=test_o_txs_shard4_replica3 - recoveringAfterStartup=true
  INFO  RecoveryThread-test_o_txs_shard4_replica3 update.PeerSync
 [Solr_335]
  [] PeerSync: core=test_o_txs_shard4_replica3 url=https://host1:9232/solr
  START replicas=[https://host2:9232/solr/test_o_txs_shard4_replica1/]
  nUpdates=100
  WARN  RecoveryThread-test_o_txs_shard4_replica3 update.PeerSync
 [Solr_335]
  [] no frame of reference to tell if we've missed updates
  INFO  RecoveryThread-test_o_txs_shard4_replica3 cloud.RecoveryStrategy
  [Solr_335] [] PeerSync Recovery was not successful - trying replication.
  core=test_o_txs_shard4_replica3
  INFO  RecoveryThread-test_o_txs_shard3_replica1 cloud.RecoveryStrategy
  [Solr_335] [] Attempting to PeerSync from
  https://host3:9232/solr/test_o_txs_shard3_replica2/
  core=test_o_txs_shard3_replica1 - recoveringAfterStartup=true
  INFO  RecoveryThread-test_o_txs_shard3_replica1 update.PeerSync
 [Solr_335]
  [] PeerSync: core=test_o_txs_shard3_replica1 url=https://host1:9232/solr
  START replicas=[https://host3:9232/solr/test_o_txs_shard3_replica2/]
  nUpdates=100
  WARN  RecoveryThread-test_o_txs_shard3_replica1 update.PeerSync
 [Solr_335]
  [] no frame of reference to tell if we've missed updates
  INFO  RecoveryThread-test_o_txs_shard3_replica1 cloud.RecoveryStrategy
  [Solr_335] [] PeerSync Recovery was not successful - trying replication.
  core=test_o_txs_shard3_replica1
 
  any hints?
 
  Regards
  /Fadi
 




-- 
Anshum Gupta


Re: Please help test the new Angular JS Admin UI

2015-06-17 Thread Anshum Gupta
Also, while you are at it, it'd be good to get SOLR-4777 in so the Admin UI
is correct when users look at the SolrCloud graph post an operation that
can leave the slice INACTIVE e.g. Shard split.

On Wed, Jun 17, 2015 at 2:50 PM, Anshum Gupta ans...@anshumgupta.net
wrote:

 This looks good overall and thanks for migrating it to something that more
 developers can contribute to.

 I started solr (trunk) in cloud mode using the bin scripts and opened the
 new admin UI. The section for 'cores' says 'No cores available. Go and
 create one'.
 Starting Solr 5.0, we officially stated in the change log and at other
 places that the only supported way to create a collection is through the
 Collections API. We should move along those lines and not stray with the
 new interface. I am not sure if the intention with this move is to first
 migrate everything as is and then redo the design but I'd strongly suggest
 that we do things the right way.

 On Sun, Jun 14, 2015 at 5:53 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 And anyone who, you know, really likes working with UI code please
 help making it better!

 As of Solr 5.2, there is a new version of the Admin UI available, and
 several improvements are already in 5.2.1 (release imminent). The old
 admin UI is still the default, the new one is available at

 solr_ip:port/admin/index.html

 Currently, you will see very little difference at first glance; the
 goal for this release was to have as much of the current functionality
 as possible ported to establish the framework. Upayavira has done
 almost all of the work getting this in place, thanks for taking that
 initiative Upayavira!

 Anyway, the plan is several fold:
  Get as much testing on this as possible over the 5.2 time frame.
  Make the new Angular JS-based code the default in 5.3
  Make improvements/bug fixes to the admin UI on the new code line,
 particularly SolrCloud functionality.
  Deprecate the current code and remove it eventually.

 The new code should be quite a bit easier to work on for programmer
 types, and there are Big Plans Afoot for making the admin UI more
 SolrCloud-friendly. Now that the framework is in place, it should be
 easier for anyone who wants to volunteer to contribute, please do!

 So please give it a whirl. I'm sure there will be things that crop up,
 and any help addressing them will be appreciated. There's already an
 umbrella JIRA for this work, see:
 https://issues.apache.org/jira/browse/SOLR-7666. Please link any new
 issues to this JIRA so we can keep track of it all as well as
 coordinate efforts. If all goes well, this JIRA can be used to see
 what's already been reported too.

 Note that things may be moving pretty quickly, so trunk and 5x will
 always be the most current. That said looking at 5.2.1 will be much
 appreciated.

 Erick




 --
 Anshum Gupta




-- 
Anshum Gupta


Re: Please help test the new Angular JS Admin UI

2015-06-17 Thread Anshum Gupta
This looks good overall and thanks for migrating it to something that more
developers can contribute to.

I started solr (trunk) in cloud mode using the bin scripts and opened the
new admin UI. The section for 'cores' says 'No cores available. Go and
create one'.
Starting Solr 5.0, we officially stated in the change log and at other
places that the only supported way to create a collection is through the
Collections API. We should move along those lines and not stray with the
new interface. I am not sure if the intention with this move is to first
migrate everything as is and then redo the design but I'd strongly suggest
that we do things the right way.

On Sun, Jun 14, 2015 at 5:53 PM, Erick Erickson erickerick...@gmail.com
wrote:

 And anyone who, you know, really likes working with UI code please
 help making it better!

 As of Solr 5.2, there is a new version of the Admin UI available, and
 several improvements are already in 5.2.1 (release imminent). The old
 admin UI is still the default, the new one is available at

 solr_ip:port/admin/index.html

 Currently, you will see very little difference at first glance; the
 goal for this release was to have as much of the current functionality
 as possible ported to establish the framework. Upayavira has done
 almost all of the work getting this in place, thanks for taking that
 initiative Upayavira!

 Anyway, the plan is several fold:
  Get as much testing on this as possible over the 5.2 time frame.
  Make the new Angular JS-based code the default in 5.3
  Make improvements/bug fixes to the admin UI on the new code line,
 particularly SolrCloud functionality.
  Deprecate the current code and remove it eventually.

 The new code should be quite a bit easier to work on for programmer
 types, and there are Big Plans Afoot for making the admin UI more
 SolrCloud-friendly. Now that the framework is in place, it should be
 easier for anyone who wants to volunteer to contribute, please do!

 So please give it a whirl. I'm sure there will be things that crop up,
 and any help addressing them will be appreciated. There's already an
 umbrella JIRA for this work, see:
 https://issues.apache.org/jira/browse/SOLR-7666. Please link any new
 issues to this JIRA so we can keep track of it all as well as
 coordinate efforts. If all goes well, this JIRA can be used to see
 what's already been reported too.

 Note that things may be moving pretty quickly, so trunk and 5x will
 always be the most current. That said looking at 5.2.1 will be much
 appreciated.

 Erick




-- 
Anshum Gupta


[ANNOUNCE] Apache Solr 5.2.0 and Reference Guide for Solr 5.2 released

2015-06-07 Thread Anshum Gupta
07 June 2015, Apache Solr™ 5.2 available

Solr is the popular, blazing fast, open source NoSQL search platform from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search, dynamic clustering, database
integration, rich document (e.g., Word, PDF) handling, and geospatial
search.  Solr is highly scalable, providing fault tolerant distributed
search and indexing, and powers the search and navigation features of many
of the world's largest internet sites.

Solr 5.2 is available for immediate download at:
  http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

Please read CHANGES.txt for a full list of new features and changes:
  https://lucene.apache.org/solr/5_2_0/changes/Changes.html

Solr 5.2 Release Highlights:

 * Restore API allows restoring a core from an index backup.

 * JSON Facet API
   * unique() is now implemented for numeric and date fields
   * Optional flatter form via a type parameter
   * Added support for mincount parameter in range facets to suppress
buckets less than that count
   * Multi-select faceting support for the Facet Module via the
excludeTags parameter which disregards any matching tagged filters for
that facet.
   * hll() facet function for distributed cardinality via HyperLogLog
algorithm.
See examples at http://yonik.com/solr-count-distinct/

 * A new facet.range.method parameter to let users choose how to do range
faceting between an implementation based on filters (previous algorithm,
using facet.range.method=filter) or DocValues (facet.range.method=dv)

 * Rule-based Replica assignment during collection, shard, and replica
creation.

 * Stats component:
   * New 'cardinality' option for stats.field, uses HyperLogLog to
efficiently estimate the cardinality of a field w/bounded RAM. Blog post:
https://lucidworks.com/blog/hyperloglog-field-value-cardinality-stats/
   * stats.field now supports individual local params for 'countDistinct'
and 'distinctValues'. 'calcdistinct' is still supported as an alias for
both options.

* Solr security
   * Authentication and Authorization frameworks that define interfaces,
and mechanisms to create, load, and use authorization/authentication
plugins have been added.
   * A Kerberos authentication plugin which would allow running a
Kerberized Solr setup.

* Solr Streaming Expressions. See
https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions

 * bin/post (and SimplePostTool in -Dauto=yes mode) now sends rather than
skips files without a known content type, as application/octet-stream,
provided it still is in the allowed filetypes setting.

 * HDFS transaction log replication factor is now configurable

 * A cluster-wide property can now be be added/edited/deleted using the
zkcli script and doesn't require a running Solr instance.

 * New spatial RptWithGeometrySpatialField, based on
CompositeSpatialStrategy, which blends RPT indexes for speed with
serialized geometry for accuracy.  Includes a Lucene segment based
in-memory shape cache.

 * Refactored Admin UI using AngularJS. It isn't the default, but a
parallel UI interface in this release.

 * Solr has internally been upgraded to use Jetty 9.

Solr 5.2 also includes many other new features as well as numerous
optimizations and bugfixes of the corresponding Apache Lucene release.

For upgrading from 5.1, please look at the Upgrading from Solr 5.1
section in the change log.

Detailed change log:
http://lucene.apache.org/solr/5_2_0/changes/Changes.html

Also available is the Solr Reference Guide for Solr 5.2. This PDF serves as
the definitive user's manual for Solr 5.2. It can be downloaded from the
Apache mirror network: https://s.apache.org/Solr-Ref-Guide-PDF

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases.  It is possible that the mirror you are using
may not have replicated the release yet.  If that is the case, please try
another mirror.  This also goes for Maven access.

-- 
Anshum Gupta


Re: Shard still around after calling splitshard

2015-06-04 Thread Anshum Gupta
Hi Mike,

Once the SPLITSHARD call completes, it just marks the original shard as
Inactive i.e. it no longer accepts requests. So yes, you would have to use
DELETESHARD (
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api7)
to clean it up.

As far as what you see on the admin UI, that information is wrong i.e. the
UI does not respect the state of the shard while displaying them. So,
though the parent shard might be inactive, you still would end up seeing it
as just another active shard. There's an open issue for this one.

One way to confirm the shard state is by looking at the shard state in
clusterstate.json (or state.json, depending upon the version of Solr you're
using).


On Thu, Jun 4, 2015 at 10:35 AM, Mike Thomsen mikerthom...@gmail.com
wrote:

 I thought splitshard was supposed to get rid of the original shard,
 shard1, in this case. Am I missing something? I was expecting the only two
 remaining shards to be shard1_0 and shard1_1.

 The REST call I used was
 /admin/collections?collection=default-collectionshard=shard1action=SPLITSHARD
 if that helps.

 Attached is a screenshot of the Cloud view in the admin console after
 running splitshard.

 Should it look like that? Do I need to delete shard1 now?

 Thanks,

 Mike




-- 
Anshum Gupta


Re: Verify a certain Replica contains a document

2015-05-18 Thread Anshum Gupta
I just tested out what you've mentioned and see the same behavior. I think
it calls for a JIRA and a fix.
distrib=false shouldn't consult zk in my opinion, else it makes no sense to
have that param. I'm not sure but it might just be regression.

Can you create a JIRA? I'll take it up soon if no one else does.

On Fri, May 15, 2015 at 10:45 PM, Shai Erera ser...@gmail.com wrote:

 Yes. Here's what I do:

 Start two embedded Solr nodes (i.e. like using MiniSolrCloudCluster). They
 were started on ports 63175 and 63201.

 Create a collection with one shard and replica.
 /solr/admin/collections?action=clusterstatus shows it was created on
 127.0.0.1:63201_solr.

 Index a document: curl -i -X POST
 http://127.0.0.1:63175/solr/mycollection/update/json?commit=true -d
 '[{id:doc1}]'

 Verify 63175 contains no cores:
 http://127.0.0.1:63175/solr/admin/cores?action=status
 Verify 63201 contains one core:
 http://127.0.0.1:63201/solr/admin/cores?action=status -- returns an index
 w/ numDocs=maxDoc=1.

 All of these return the document though:

 http://127.0.0.1:63175/solr/mycollection/select?q=*
 http://127.0.0.1:63175/solr/mycollection/select?q=*distrib=false
 http://127.0.0.1:63175/solr/mycollection_shard1_replica1/select?q=*

 http://127.0.0.1:63175/solr/mycollection_shard1_replica1/select?q=*distrib=false

 This returns Can not find: /solr/core_node1/select on both nodes (which
 is expected since there's no such core on any of the nods):
 http://127.0.0.1:63175/solr/core_node1/select?q=*

 Shai

 On Sat, May 16, 2015 at 8:08 AM, Anshum Gupta ans...@anshumgupta.net
 wrote:

  Did you also try querying /core.name/select with distrib=false ?
 
  On Fri, May 15, 2015 at 9:22 PM, Shai Erera ser...@gmail.com wrote:
 
   Hi
  
   Is there a REST API in Solr that allows me to query a certain
  Replica/core?
   I am writing some custom replica-recovery code and I'd like to verify
  that
   it works well.
  
   I wanted to use the /collection/select API, passing
   shards=host.under.test:ip/solr/collection, but that also works even if
   'host.under.test' does not hold any local replicas. This makes sense
  from a
   distributed search perspective, but doesn't help me. Also, passing
   distrib=false, which I found by searching the web, didn't help and
 seems
  to
   be ignored, or at least there's still a fallback that makes
   'host.under.test' access the other nodes in the cluster to fulfill the
   request.
  
   Next I looked at /admin/cores?action=STATUS API. This looks better as
 it
   allows me to list the cores on 'host.under.test' and I can get
 index-wide
   statistics such as numDocs and maxDoc. This is better cause in my
 tests I
   know how many documents I should expect.
  
   But I was wondering if
  
   (1) Is Core admin API the proper way to achieve what I want, or is
 there
  a
   better way?
   (2) Is there core-specific API for select/get, like there is for
   /collection. I tried /core.name/select, but again, I received results
  even
   when querying the node w/ no local replicas.
  
   Shai
  
 
 
 
  --
  Anshum Gupta
 




-- 
Anshum Gupta


  1   2   3   >