[
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895297#action_12895297
]
David Tuška commented on SOLR-236:
----------------------------------
Hello, I find some bug in "Field collapsing",
I will tested it for solr-1.4.1-patch and try test for trunk-patch(rev.955615)
too.
1) No collapse_counts/results will be returned if collapseCount==1,
although no-collapse will be returned.
http://localhost:8080/solr_tour/select/?q=nl_counter%3A1%0D%0A&start=0&rows=10&indent=on&sort=c_price_from_orig+asc&collapse.field=nl_tour_id&collapse.threshold=1&collapse.type=adjacent&collapse.debug=true
{code:xml}
<lst name="collapse_counts">
<str name="field">nl_tour_id</str>
<lst name="results"/>
<lst name="debug">
<str name="Docset type">HashDocSet(26)</str>
<long name="Total collapsing time(ms)">0</long>
<long name="Create uncollapsed docset(ms)">0</long>
<long name="Get fieldvalues from fieldcache (ms)">0</long>
<long name="AdjacentDocumentCollapser collapsing time(ms)">0</long>
<long name="Creating collapseinfo time(ms)">0</long>
<long name="Convert to bitset time(ms)">0</long>
<long name="Create collapsed docset time(ms)">0</long>
</lst>
</lst>
<result name="response" numFound="26" start="0">
10x <doc></doc>
...
{code}
If I look into code, I find some problematic part of code:
In NonAdjacentDocumentCollapser.java in function doCollapsing is bad condition
and priorityQueue:
{code:title=NonAdjacentDocumentCollapser.java}
protected void doCollapsing(DocSet uncollapsedDocset, FieldCache.StringIndex
values) {
for (DocIterator i = uncollapsedDocset.iterator(); i.hasNext();) {
int currentId = i.nextDoc();
String currentValue = values.lookup[values.order[currentId]];
NonAdjacentCollapseGroup collapseDoc = collapsedDocs.get(currentValue);
if (collapseDoc == null) {
..
}
Integer dropOutId = (Integer)
collapseDoc.priorityQueue.insertWithOverflow(currentId);
// IMHO HERE must be >= NO > !!!!
if (++collapseDoc.totalCount > collapseThreshold) {
collapseDoc.collapsedDocuments++;
//HERE IS PROBLEM TOO, if doc is only one, then is not returned by
collapseDoc.priorityQueue.insertWithOverflow for collapse.threshold=1
if (dropOutId != null)
{
for (CollapseCollector collector : collectors) {
collector.documentCollapsed(dropOutId, collapseDoc, collapseContext);
}
}
}
}
{code}
In AdjacentDocumentCollapser.java in doCollapsing is problem in Initializing
condition,
if doc is only one, then only inicializing condition is process, else-if, else
part not will be processed and collector.documentCollapsed or
collector.documentHead not will be call.
{code:title=NonAdjacentDocumentCollapser.java}
protected void doCollapsing(DocSet uncollapsedDocset, FieldCache.StringIndex
values) {
...
String collapseValue = null;
...
for (DocIterator i = uncollapsedDocset.iterator(); i.hasNext();) {
int currentId = i.nextDoc();
String currentValue = values.lookup[values.order[currentId]];
// Initializing
if (collapseValue == null) {
repeatCount = 0;
collapseCount = 0;
collapseId = currentId;
collapseValue = currentValue;
// Collapse the document if the field value is the same and
// we have a run of at least collapseThreshold uncollapsedDocset.
}
//IMHO HERE MUST BE if NO else-if !!!!
else if (collapseValue.equals(currentValue))
{
if (++repeatCount >= collapseThreshold) {
collapseCount++;
for (CollapseCollector collector : collectors) {
CollapseGroup valueToCollapse = new AdjacentCollapseGroup(collapseId,
currentValue);
collector.documentCollapsed(currentId, valueToCollapse,
collapseContext);
}
} else {
addDoc(currentId);
}
}
else
{
...
}
...
}
...
}
{code}
2) I have problem with sorting, I need sort CollapseGroup by c_price_from_orig
field,
but if I have in request "sort=c_price_from_orig+asc",
returned CollapseGroup will be sorted by c_price_from_orig (minimum of
collapsed doc in group),
but some CollapseGroup will be skiped and doc with c_price_from_orig will not
be returned firts !!!
I try debug this problem and report this better.
thanks for your reply,
sorry for my english and
best regards
David
> Field collapsing
> ----------------
>
> Key: SOLR-236
> URL: https://issues.apache.org/jira/browse/SOLR-236
> Project: Solr
> Issue Type: New Feature
> Components: search
> Affects Versions: 1.3
> Reporter: Emmanuel Keller
> Assignee: Shalin Shekhar Mangar
> Fix For: Next
>
> Attachments: collapsing-patch-to-1.3.0-dieter.patch,
> collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch,
> collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java,
> field-collapse-3.patch, field-collapse-4-with-solrj.patch,
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch,
> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch,
> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff,
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff,
> NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java,
> quasidistributed.additional.patch, SOLR-236-1_4_1.patch,
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
> SOLR-236-FieldCollapsing.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch,
> SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch,
> SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch,
> SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch,
> SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given
> field to a single entry in the result set. Site collapsing is a special case
> of this, where all results for a given web site is collapsed into one or two
> entries in the result set, typically with an associated "more documents from
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]