Re: Filtering results by minimum relevancy score

2017-04-12 Thread David Kramer
Thank you!  That worked.


From: Ahmet Arslan <iori...@yahoo.com>
Date: Wednesday, April 12, 2017 at 3:15 PM
To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>, David Kramer 
<david.kra...@shoebuy.com>
Subject: Re: Filtering results by minimum relevancy score

Hi,

I cannot find it. However it should be something like

q=hello={!frange l=0.5}query($q)

Ahmet

On Wednesday, April 12, 2017, 10:07:54 PM GMT+3, Ahmet Arslan 
<iori...@yahoo.com.INVALID> wrote:
Hi David,
A function query named "query" returns the score for the given subquery.
Combined with frange query parser this is possible. I tried it in the past.I am 
searching the original post. I think it was Yonik's post.
https://cwiki.apache.org/confluence/display/solr/Function+Queries


Ahmet


On Wednesday, April 12, 2017, 9:45:17 PM GMT+3, David Kramer 
<david.kra...@shoebuy.com> wrote:
The idea is to not return poorly matching results, not to limit the number of 
results returned.  One query may have hundreds of excellent matches and another 
query may have 7. So cutting off by the number of results is trivial but not 
useful.

Again, we are not doing this for performance reasons. We’re doing this because 
we don’t want to show products that are not very relevant to the search terms 
specified by the user for UX reasons.

I had hoped that the responses would have been more focused on “it’ can’t be 
done” or “here’s how to do it” than “you don’t want to do it”.  I’m still left 
not knowing if it’s even possible. The one concrete answer of using frange 
doesn’t help as referencing score in either the q or the fq produces an 
“undefined field” error.

Thanks.

On 4/11/17, 8:59 AM, "Dorian Hoxha" <dorian.ho...@gmail.com> wrote:

Can't the filter be used in cases when you're paginating in
sharded-scenario ?
So if you do limit=10, offset=10, each shard will return 20 docs ?
While if you do limit=10, _score<=last_page.min_score, then each shard will
return 10 docs ? (they will still score all docs, but merging will be
faster)

Makes sense ?

On Tue, Apr 11, 2017 at 12:49 PM, alessandro.benedetti <a.benede...@sease.io
> wrote:

> Can i ask what is the final requirement here ?
> What are you trying to do ?
>  - just display less results ?
> you can easily do at search client time, cutting after a certain amount
> - make search faster returning less results ?
> This is not going to work, as you need to score all of them as Erick
> explained.
>
> Function query ( as Mikhail specified) will run on a per document basis (
> if
> I am correct), so if your idea was to speed up the things, this is not
> going
> to work.
>
> It makes much more sense to refine your system to improve relevancy if 
your
> concern is to have more relevant docs.
> If your concern is just to not show that many pages, you can limit that
> client side.
>
>
>
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Filtering-results-by-minimum-relevancy-score-
> tp4329180p4329295.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



Re: Filtering results by minimum relevancy score

2017-04-12 Thread David Kramer
The idea is to not return poorly matching results, not to limit the number of 
results returned.  One query may have hundreds of excellent matches and another 
query may have 7. So cutting off by the number of results is trivial but not 
useful.

Again, we are not doing this for performance reasons. We’re doing this because 
we don’t want to show products that are not very relevant to the search terms 
specified by the user for UX reasons.

I had hoped that the responses would have been more focused on “it’ can’t be 
done” or “here’s how to do it” than “you don’t want to do it”.   I’m still left 
not knowing if it’s even possible. The one concrete answer of using frange 
doesn’t help as referencing score in either the q or the fq produces an 
“undefined field” error.

Thanks.

On 4/11/17, 8:59 AM, "Dorian Hoxha"  wrote:

Can't the filter be used in cases when you're paginating in
sharded-scenario ?
So if you do limit=10, offset=10, each shard will return 20 docs ?
While if you do limit=10, _score<=last_page.min_score, then each shard will
return 10 docs ? (they will still score all docs, but merging will be
faster)

Makes sense ?

On Tue, Apr 11, 2017 at 12:49 PM, alessandro.benedetti  wrote:

> Can i ask what is the final requirement here ?
> What are you trying to do ?
>  - just display less results ?
> you can easily do at search client time, cutting after a certain amount
> - make search faster returning less results ?
> This is not going to work, as you need to score all of them as Erick
> explained.
>
> Function query ( as Mikhail specified) will run on a per document basis (
> if
> I am correct), so if your idea was to speed up the things, this is not
> going
> to work.
>
> It makes much more sense to refine your system to improve relevancy if 
your
> concern is to have more relevant docs.
> If your concern is just to not show that many pages, you can limit that
> client side.
>
>
>
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Filtering-results-by-minimum-relevancy-score-
> tp4329180p4329295.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>




Filtering results by minimum relevancy score

2017-04-10 Thread David Kramer
I’ve done quite a bit of searching on this.  Pretty much every page I find says 
it’s a bad idea and won’t work well, but I’ve been asked to at least try it to 
reduce the number of completely unrelated results returned.  We are not trying 
to normalize the number, or display it as a percentage, and I understand why 
those are not mathematically sound.  We are relying on Solr for pagination, so 
we can’t just filter out low scores from the results.

I had assumed that you could use score in the filter query, but that doesn’t 
appear to be the case.  Is there a special way to reference it, or is there 
another way to attack the problem?  It seems like something that should be 
allowed and possible.

Thanks.


Re: ChildDocTransformerFactory and returning only parents with children

2017-03-20 Thread David Kramer
I’ll be honest I didn’t understand most of what you wrote (like I said we’re 
just getting started with this).  We will most certainly need to do faceted 
search in future iterations so thanks for the “json.facets” reference.  And I 
do understand that the ChildDocTransformer is really for controlling what gets 
output and not for finding or filtering rows.

Your answer started me thinking about solving different parts of the problem in 
different parts of the query.  I got something that works now:
   q=title:"Under Armour" OR description:"Under Armour"
fq={!parent which=docType:Product}color:*Blue*
   fl=title, description, brand,id,[child parentFilter="docType:Product" 
childFilter="color:*Blue*"]  
This does show me only Under Armor products with blue items, and returns just 
the blue items nested inside the products.  That will work. There may be a more 
efficient/direct way of doing it, but at least we can move forward.  Is this a 
good approach?

With respect to multiple levels, it’s not a matter of trying to query more than 
two nested documents deep, it’s a matter of I haven’t seen a single example of 
how to query more than two levels.  The documentation and every example I found 
for ChildDocTransformer and Block Join just show parents and children.  A few 
hours ago Mikhail graciously send me a link off-list to an article that 
basically says grandchildren are children too so you can search/filter on them 
as if they were children, and I understood most of it. Will have to dig into it 
more.

Thanks!

On 3/20/17, 1:20 PM, "Alexandre Rafalovitch" <arafa...@gmail.com> wrote:

You should be able to nest things multiple levels deep. What happens
when you try?

For trying to find parents where children satisfy some criteria,
[child] result transformer is probably a bit later. You may want to
look into json.facets instead and search against children with
shifting domain up to parents after. Then, you also do the [child]
transformer to get the expanded children (if you need them).

Regards,
   Alex.



http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 20 March 2017 at 11:58, David Kramer <david.kra...@shoebuy.com> wrote:
> Hi.  We’re just ramping up a product search engine for our eCommerce 
site, so this is all new development and we are slowly building up our Solr 
knowledgebase, so thanks in advance for any guidance.
>
> Our catalog (mostly shoes and apparel) has three objects nested: Products 
(title, description, etc), items (color, price, etc), and SKU (size, etc).  
Since Solr doesn’t do documents nested three deep, the SKUs and items both get 
retrieved as children of products.  That has not bit us yet…  Also, our search 
results page expects a list of Item objects, then groups them (rolls them up) 
by their parent object.  Right now we are returning just the items, and that’s 
great, but we want to implement pagination of the products, so we need to 
return the items nested in products, then paginate on the products.
>
> If I send ‘q=docType:Product description:Armour=title, 
description,id,[child parentFilter="docType:Product" 
childFilter="docType:Item"]’ I get a nice list of products with items nested 
inside them. Woot.
>
> The problem is, if we want to filter on item attributes, I get back 
products that have no children, which means we can’t paginate on the results if 
we remove those parents.  For instance, send ‘q=docType:Product 
description:Armour=title, description,id,[child 
parentFilter="docType:Product" childFilter="docType:Item AND price:49.99"]’, we 
get the products and their items nicely nested, and only items with a price of 
49.99 are shown, but so are parents that have no matching items.
>
> How can I build a query that will not return parents without children? I 
haven’t figured out a way to reference the children in the query.
>
> Since we’re not in production yet, I can change lots of things here.  I 
would PREFER not to denormalize the documents into one document per SKU with 
all the item and product information too, as our catalog is quite large and 
that would lead to a huge import file and lots of duplicated content between 
documents in the index.  If that’s the only way, though, it is possible.
>
> Thanks in advance.




ChildDocTransformerFactory and returning only parents with children

2017-03-20 Thread David Kramer
Hi.  We’re just ramping up a product search engine for our eCommerce site, so 
this is all new development and we are slowly building up our Solr 
knowledgebase, so thanks in advance for any guidance.

Our catalog (mostly shoes and apparel) has three objects nested: Products 
(title, description, etc), items (color, price, etc), and SKU (size, etc).  
Since Solr doesn’t do documents nested three deep, the SKUs and items both get 
retrieved as children of products.  That has not bit us yet…  Also, our search 
results page expects a list of Item objects, then groups them (rolls them up) 
by their parent object.  Right now we are returning just the items, and that’s 
great, but we want to implement pagination of the products, so we need to 
return the items nested in products, then paginate on the products.

If I send ‘q=docType:Product description:Armour=title, description,id,[child 
parentFilter="docType:Product" childFilter="docType:Item"]’ I get a nice list 
of products with items nested inside them. Woot.

The problem is, if we want to filter on item attributes, I get back products 
that have no children, which means we can’t paginate on the results if we 
remove those parents.  For instance, send ‘q=docType:Product 
description:Armour=title, description,id,[child 
parentFilter="docType:Product" childFilter="docType:Item AND price:49.99"]’, we 
get the products and their items nicely nested, and only items with a price of 
49.99 are shown, but so are parents that have no matching items.

How can I build a query that will not return parents without children? I 
haven’t figured out a way to reference the children in the query.

Since we’re not in production yet, I can change lots of things here.  I would 
PREFER not to denormalize the documents into one document per SKU with all the 
item and product information too, as our catalog is quite large and that would 
lead to a huge import file and lots of duplicated content between documents in 
the index.  If that’s the only way, though, it is possible.

Thanks in advance.


Re: Solr querying nested documents with ChildDocTransformerFactory, get “Parent query yields document which is not matched by parents filter”

2017-02-06 Thread David Kramer
For closure, I’ve solved the problem!  It was not using my schema.xml at all.  
I had to change the solrconfig.xml to include  and comment out the schema adding processor.

My schema still didn’t work right, but I took the managed-schema and renamed it 
and changed uniqueKey to uuid and everything worked!

Thanks for your time and help.


On 2/2/17, 4:35 PM, "David Kramer" <david.kra...@shoebuy.com> wrote:

Yes, think of the starving orphan records…

Ours is an eCommerce system, selling mostly shoes.  We have three levels of 
nested objects representing what we sell:
- Product: Mostly title and description
- Item: A specific color and some other attributes, including price. 
Products have 1 or more Items, Items belong to one product.
- SKU: A specific size and SKU ID. Items have 1 or more SKUs, SKUs belong 
to one Item.
[PRODUCT  [ITEM  [SKU] [SKU] [SKU]] [ITEM [SKU]] ]

Products, items, and SKUs all have ID numbers. One product will never have 
the same ID as another product, but it’s possible for a product to have the 
same ID as an Item or a SKU. And that is the problem.  So the program that 
creates the import file adds a new field called uuid, that is a P, I, or S (for 
Product, Item, or SKU) followed by the ID.  We did it this way because my 
understanding is Solr can’t implement a compound unique key.  The uuid is 
unique across all documents, not just all documents of the same docType.

So in the case of my unique test to see if it would complain if the UUID of 
a document I was inserting was not unique, I grabbed the first few products 
from the full import file, and changed the IDs so they are not duplicates of 
the real data, but left the UUIDs alone, so they are duplicates of the real 
data, which was already loaded.  

My expectation was that when I loaded the data I would get some  error 
saying that UUID was already used.  YOUR expectation is that the record would 
be overwritten.  What actually happened is that the new documents got added 
with their duplicate UUIDs, which is the worst possible case.  This is why I 
think it’s not respecting my uniqueKey setting in schema.xml.

Does that make more sense?  I hope you can help me understand this 
discrepancy. Thanks for your efforts so far.

On 2/2/17, 3:13 PM, "Mikhail Khludnev" <m...@apache.org> wrote:

David,
I hardly get the way which IDs are assigned, but beware that repeating
uniqueKey
value causes deleting former occurrence. In case of block join index it
corrupts block structure: parent can't be deleted and left children 
orphans
(.. so touching, I'm sorry). Just make sure that number of deleted docs 
is
0 at first.

On Thu, Feb 2, 2017 at 6:20 PM, David Kramer <david.kra...@shoebuy.com>
wrote:

> Thanks, for responding. Mikhail.  There are no deleted documents.  
Since
> I’m fairly new to Solr, one of the things I’ve been paranoid about is 
I
> have no way of validating my schema.xml, or know whether Solr is even 
using
> it (I have evidence it’s not, more below). So for each test, I’ve 
wiped out
> the index, recreated, and reimported.
>
> Back to whether my schema.xml is being used, I mentioned that I had to
> come up with a compound UUID field of the first character of the 
docType
> plus the ID, and we put “uuid” (was id) in our
> schema.xml.  Then I deleted and recreated the index and restarted 
Solr.  In
> order to verify it was working, I created an import file that had 
unique
> IDs but UUIDs which were duplicates of existing records, and it 
imported
> the new records even though the UUIDs existed in the database 
already.  I’m
> not sure if Solr should have produced an error or not. I’ll research 
that,
> but I mention that here in case it’s relevant.
>
> Thanks.
>
> On 2/2/17, 6:10 AM, "Mikhail Khludnev" <m...@apache.org> wrote:
>
> David,
>
> Can you make sure your index doesn't have deleted docs? This  can 
be
> seen
> in SolrAdmiun.
    > And can you merge index to avoid having them in the index?
>
> On Thu, Feb 2, 2017 at 12:29 AM, David Kramer <
> david.kra...@shoebuy.com>
> wrote:
>
> >
> >
> > Some background:
> > · The data involved is catalog data, with three nested
> objects:
> > Products, Items, and Skus, in that order. We have a docType 
field on
> each
> > record as a differentiator.
 

Re: Issues with uniqueKey != id?

2017-02-06 Thread David Kramer
I’m just setting that up now. I’m far from a Solr expert so I won’t swear we’re 
doing it right though

Our issue is that we have documents, nested 3 deep.  Products, Items, and SKUs. 
 Each has an ID field that’s unique within the document type, but unfortunately 
we have products with the same ID as Items, etc.  So we created a new field 
UUID that’s a concatenation of the document type (first letter, actually) and 
the ID, which is unique.  

The program that creates the import file builds that field, as it’s my 
understanding you can’t use copyfield for the unique key field for some reason 
related to SolrCloud (sorry I don’t have the URL for where I saw that).  I 
would love to be able to copyfield them together though and have the import 
file be smaller.

On 2/3/17, 11:49 AM, "Matthias X Falkenberg"  wrote:

Howdy,

In the Solr Wiki I stumbled upon a somewhat vague statement on the 
uniqueKey:

>  https://wiki.apache.org/solr/SchemaXml#The_Unique_Key_Field
>  It shouldn't matter whether you rename this to something else (and 
change the  value), but occasionally it has in the past. We 
recommend that you just leave this definition alone. 

I'd be very grateful for any positive or negative experiences with 
"uniqueKey" not being set to "id" - especially if your experiences are 
related to Solr 6.2.1+.

Many thanks,

Matthias Falkenberg

IBM Deutschland Research & Development GmbH / Vorsitzende des 
Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, 
HRB 243294





Re: Solr querying nested documents with ChildDocTransformerFactory, get “Parent query yields document which is not matched by parents filter”

2017-02-02 Thread David Kramer
Yes, think of the starving orphan records…

Ours is an eCommerce system, selling mostly shoes.  We have three levels of 
nested objects representing what we sell:
- Product: Mostly title and description
- Item: A specific color and some other attributes, including price. Products 
have 1 or more Items, Items belong to one product.
- SKU: A specific size and SKU ID. Items have 1 or more SKUs, SKUs belong to 
one Item.
[PRODUCT  [ITEM  [SKU] [SKU] [SKU]] [ITEM [SKU]] ]

Products, items, and SKUs all have ID numbers. One product will never have the 
same ID as another product, but it’s possible for a product to have the same ID 
as an Item or a SKU. And that is the problem.  So the program that creates the 
import file adds a new field called uuid, that is a P, I, or S (for Product, 
Item, or SKU) followed by the ID.  We did it this way because my understanding 
is Solr can’t implement a compound unique key.  The uuid is unique across all 
documents, not just all documents of the same docType.

So in the case of my unique test to see if it would complain if the UUID of a 
document I was inserting was not unique, I grabbed the first few products from 
the full import file, and changed the IDs so they are not duplicates of the 
real data, but left the UUIDs alone, so they are duplicates of the real data, 
which was already loaded.  

My expectation was that when I loaded the data I would get some  error saying 
that UUID was already used.  YOUR expectation is that the record would be 
overwritten.  What actually happened is that the new documents got added with 
their duplicate UUIDs, which is the worst possible case.  This is why I think 
it’s not respecting my uniqueKey setting in schema.xml.

Does that make more sense?  I hope you can help me understand this discrepancy. 
Thanks for your efforts so far.

On 2/2/17, 3:13 PM, "Mikhail Khludnev" <m...@apache.org> wrote:

David,
I hardly get the way which IDs are assigned, but beware that repeating
uniqueKey
value causes deleting former occurrence. In case of block join index it
corrupts block structure: parent can't be deleted and left children orphans
(.. so touching, I'm sorry). Just make sure that number of deleted docs is
0 at first.

On Thu, Feb 2, 2017 at 6:20 PM, David Kramer <david.kra...@shoebuy.com>
wrote:

> Thanks, for responding. Mikhail.  There are no deleted documents.  Since
> I’m fairly new to Solr, one of the things I’ve been paranoid about is I
> have no way of validating my schema.xml, or know whether Solr is even 
using
> it (I have evidence it’s not, more below). So for each test, I’ve wiped 
out
> the index, recreated, and reimported.
>
> Back to whether my schema.xml is being used, I mentioned that I had to
> come up with a compound UUID field of the first character of the docType
> plus the ID, and we put “uuid” (was id) in our
> schema.xml.  Then I deleted and recreated the index and restarted Solr.  
In
> order to verify it was working, I created an import file that had unique
> IDs but UUIDs which were duplicates of existing records, and it imported
> the new records even though the UUIDs existed in the database already.  
I’m
> not sure if Solr should have produced an error or not. I’ll research that,
> but I mention that here in case it’s relevant.
>
> Thanks.
>
> On 2/2/17, 6:10 AM, "Mikhail Khludnev" <m...@apache.org> wrote:
>
> David,
>
> Can you make sure your index doesn't have deleted docs? This  can be
> seen
> in SolrAdmiun.
>     And can you merge index to avoid having them in the index?
>
> On Thu, Feb 2, 2017 at 12:29 AM, David Kramer <
> david.kra...@shoebuy.com>
> wrote:
>
> >
> >
> > Some background:
> > · The data involved is catalog data, with three nested
> objects:
> > Products, Items, and Skus, in that order. We have a docType field on
> each
> > record as a differentiator.
> > · The "id" field in our data is unique within datatype, but
> not
> > across datatypes. We added a "uuid" field in our program that
> generates the
> > Solr import file that is the id prefixed by the first letter of the
> > docType, like P12345. That makes the uuid field unique, and we have
> that as
> > the uniqueKey in our schema.xml.
> > · We are trying to retrieve the parent Product, and all
> children
> > documents. As such, we are using the ChildDocTransformerFactory
> > ([child...]) to retrieve the children along with the parent. We h

Re: Solr querying nested documents with ChildDocTransformerFactory, get “Parent query yields document which is not matched by parents filter”

2017-02-02 Thread David Kramer
Thanks, for responding. Mikhail.  There are no deleted documents.  Since I’m 
fairly new to Solr, one of the things I’ve been paranoid about is I have no way 
of validating my schema.xml, or know whether Solr is even using it (I have 
evidence it’s not, more below). So for each test, I’ve wiped out the index, 
recreated, and reimported. 

Back to whether my schema.xml is being used, I mentioned that I had to come up 
with a compound UUID field of the first character of the docType plus the ID, 
and we put “uuid” (was id) in our schema.xml.  Then I 
deleted and recreated the index and restarted Solr.  In order to verify it was 
working, I created an import file that had unique IDs but UUIDs which were 
duplicates of existing records, and it imported the new records even though the 
UUIDs existed in the database already.  I’m not sure if Solr should have 
produced an error or not. I’ll research that, but I mention that here in case 
it’s relevant.

Thanks.

On 2/2/17, 6:10 AM, "Mikhail Khludnev" <m...@apache.org> wrote:

David,

Can you make sure your index doesn't have deleted docs? This  can be seen
in SolrAdmiun.
And can you merge index to avoid having them in the index?

On Thu, Feb 2, 2017 at 12:29 AM, David Kramer <david.kra...@shoebuy.com>
wrote:

>
>
> Some background:
> · The data involved is catalog data, with three nested objects:
> Products, Items, and Skus, in that order. We have a docType field on each
> record as a differentiator.
> · The "id" field in our data is unique within datatype, but not
> across datatypes. We added a "uuid" field in our program that generates 
the
> Solr import file that is the id prefixed by the first letter of the
> docType, like P12345. That makes the uuid field unique, and we have that 
as
> the uniqueKey in our schema.xml.
> · We are trying to retrieve the parent Product, and all children
> documents. As such, we are using the ChildDocTransformerFactory
> ([child...]) to retrieve the children along with the parent. We have not
> yet solved the problem of getting items within SKUs as nested documents in
> the results, and we will have to figure that out at some point, but for 
now
> we get them flattened
> · We are building out the proof of concept for this. This is all
> new work, so we are free to change a lot.
> · This is Solr 6.0.0, and we are importing in JSON format, if that
> matters
> · I submitted this question to StackOverflow<http://
> stackoverflow.com/questions/41969353/solr-querying-nested-documents-with-
> childdoctransformerfactory-get-parent-quer> but haven’t gotten any
> answers yet.
>
>
> Our data looks like this (I've removed some fields for simplicity):
>
> {
>
>   "id": 739063,
>
>   "docType": "Product",
>
>   "uuid": "P739063",
>
>   "_childDocuments_": [
>
> {
>
>   "id": 1537378,
>
>   "price": 25.45,
>
>   "color": "Blush",
>
>   "docType": "Item",
>
>   "productId": 739063,
>
>   "uuid": "I1537378",
>
>   "_childDocuments_": [
>
> {
>
>   "id": 12799578,
>
>   "size": "10",
>
>   "width": "W",
>
>   "docType": "Sku",
>
>   "itemId": 1537378,
>
>   "uuid": "S12799578"
>
> }
>
>   ]
>
> }
>
> }
>
>
>
> The query to fetch all Products and their children nested inside them is
> q=docType:Product=title,id,docType,[child
> parentFilter=docType:Product]. When I run that query, all is well, and it
> returns the first 10 rows. However, if I fetch more rows by adding, say
> =500, we get the error Parent query yields document which is not
> matched by parents filter, docID=XXX.
>
> When we first saw that error, we discovered our id field was not unique
> across document types, so we added the uuid field as mentioned above, 
which
> is. we also added in our schema.xml file, wiped the core, recreated it, 
and
> restarted Solr just to make sure it was in effect. We have double checked
> and are sure that the uuid fields are unique.
  

Solr querying nested documents with ChildDocTransformerFactory, get “Parent query yields document which is not matched by parents filter”

2017-02-01 Thread David Kramer


Some background:
· The data involved is catalog data, with three nested objects: 
Products, Items, and Skus, in that order. We have a docType field on each 
record as a differentiator.
· The "id" field in our data is unique within datatype, but not across 
datatypes. We added a "uuid" field in our program that generates the Solr 
import file that is the id prefixed by the first letter of the docType, like 
P12345. That makes the uuid field unique, and we have that as the uniqueKey in 
our schema.xml.
· We are trying to retrieve the parent Product, and all children 
documents. As such, we are using the ChildDocTransformerFactory ([child...]) to 
retrieve the children along with the parent. We have not yet solved the problem 
of getting items within SKUs as nested documents in the results, and we will 
have to figure that out at some point, but for now we get them flattened
· We are building out the proof of concept for this. This is all new 
work, so we are free to change a lot.
· This is Solr 6.0.0, and we are importing in JSON format, if that 
matters
· I submitted this question to 
StackOverflow
 but haven’t gotten any answers yet.


Our data looks like this (I've removed some fields for simplicity):

{

  "id": 739063,

  "docType": "Product",

  "uuid": "P739063",

  "_childDocuments_": [

{

  "id": 1537378,

  "price": 25.45,

  "color": "Blush",

  "docType": "Item",

  "productId": 739063,

  "uuid": "I1537378",

  "_childDocuments_": [

{

  "id": 12799578,

  "size": "10",

  "width": "W",

  "docType": "Sku",

  "itemId": 1537378,

  "uuid": "S12799578"

}

  ]

}

}



The query to fetch all Products and their children nested inside them is 
q=docType:Product=title,id,docType,[child parentFilter=docType:Product]. 
When I run that query, all is well, and it returns the first 10 rows. However, 
if I fetch more rows by adding, say =500, we get the error Parent query 
yields document which is not matched by parents filter, docID=XXX.

When we first saw that error, we discovered our id field was not unique across 
document types, so we added the uuid field as mentioned above, which is. we 
also added in our schema.xml file, wiped the core, recreated it, and restarted 
Solr just to make sure it was in effect. We have double checked and are sure 
that the uuid fields are unique.



In all the search results for that error that I've found, the OP did not have a 
field that could differentiate the different document types, but as you see we 
do. Since both the query and the parentFilter are searching for docType:Product 
I don't see how either could possibly return anything but parents. We've also 
tried adding childFilter=docType:Item and childFilter=docType:Sku but that did 
not help.  I also tried using title:* for the filter since only products have 
titles.



Is there anything else we can try?

Any explanation of this?

Is it possible that it's not using uuid as the unique identifier even though 
it's specified in the schema.xml, and would that even cause this?

Thanks.