[
https://issues.apache.org/jira/browse/SOLR-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604964#comment-16604964
]
Lucene/Solr QA commented on SOLR-9418:
--------------------------------------
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m
0s{color} | {color:green} The patch appears to include 2 new or modified test
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m
12s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} |
{color:green} 4m 30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} |
{color:green} 4m 30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} |
{color:green} 4m 30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m 43s{color}
| {color:red} core in the patch failed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 95m 26s{color} |
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | solr.cloud.autoscaling.sim.TestSimTriggerIntegration |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-9418 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/12938525/SOLR-9418.patch |
| Optional Tests | compile javac unit ratsources checkforbiddenapis
validatesourcepatterns |
| uname | Linux lucene2-us-west.apache.org 4.4.0-112-generic #135-Ubuntu SMP
Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality |
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
|
| git revision | master / b4a1548 |
| ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 |
| Default Java | 1.8.0_172 |
| unit |
https://builds.apache.org/job/PreCommit-SOLR-Build/177/artifact/out/patch-unit-solr_core.txt
|
| Test Results |
https://builds.apache.org/job/PreCommit-SOLR-Build/177/testReport/ |
| modules | C: solr/core U: solr/core |
| Console output |
https://builds.apache.org/job/PreCommit-SOLR-Build/177/console |
| Powered by | Apache Yetus 0.7.0 http://yetus.apache.org |
This message was automatically generated.
> Statistical Phrase Identifier
> -----------------------------
>
> Key: SOLR-9418
> URL: https://issues.apache.org/jira/browse/SOLR-9418
> Project: Solr
> Issue Type: New Feature
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Akash Mehta
> Assignee: Hoss Man
> Priority: Major
> Attachments: SOLR-9418.patch, SOLR-9418.patch, SOLR-9418.patch,
> SOLR-9418.zip
>
>
> h2. *Summary:*
> The Statistical Phrase Identifier is a Solr contribution that takes in a
> string of text and then leverages a language model (an Apache Lucene/Solr
> inverted index) to predict how the inputted text should be divided into
> phrases. The intended purpose of this tool is to parse short-text queries
> into phrases prior to executing a keyword search (as opposed parsing out each
> keyword as a single term).
> It is being generously donated to the Solr project by CareerBuilder, with the
> original source code and a quickly demo-able version located here:
> [https://github.com/careerbuilder/statistical-phrase-identifier|https://github.com/careerbuilder/statistical-phrase-identifier,]
> h2. *Purpose:*
> Assume you're building a job search engine, and one of your users searches
> for the following:
> _machine learning research and development Portland, OR software engineer
> AND hadoop, java_
> Most search engines will natively parse this query into the following boolean
> representation:
> _(machine AND learning AND research AND development AND Portland) OR
> (software AND engineer AND hadoop AND java)_
> While this query may still yield relevant results, it is clear that the
> intent of the user wasn't understood very well at all. By leveraging the
> Statistical Phrase Identifier on this string prior to query parsing, you can
> instead expect the following parsing:
> _{machine learning} \{and} \{research and development} \{Portland, OR}
> \{software engineer} \{AND} \{hadoop,} \{java}_
> It is then possile to modify all the multi-word phrases prior to executing
> the search:
> _"machine learning" and "research and development" "Portland, OR" "software
> engineer" AND hadoop, java_
> Of course, you could do your own query parsing to specifically handle the
> boolean syntax, but the following would eventually be interpreted correctly
> by Apache Solr and most other search engines:
> _"machine learning" AND "research and development" AND "Portland, OR" AND
> "software engineer" AND hadoop AND java_
> h2. *History:*
> This project was originally implemented by the search team at CareerBuilder
> in the summer of 2015 for use as part of their semantic search system. In the
> summer of 2016, Akash Mehta, implemented a much simpler version as a proof of
> concept based upon publicly available information about the CareerBuilder
> implementation (the first attached patch). In July of 2018, CareerBuilder
> open sourced their original version
> ([https://github.com/careerbuilder/statistical-phrase-identifier),|https://github.com/careerbuilder/statistical-phrase-identifier,]
> and agreed to also donate the code to the Apache Software foundation as a
> Solr contribution. An Solr patch with the CareerBuilder version was added to
> this issue on September 5th, 2018, and community feedback and contributions
> are encouraged.
> This issue was originally titled the "Probabilistic Query Parser", but the
> name has now been updated to "Statistical Phrase Identifier" to avoid
> ambiguity with Solr's query parsers (per some of the feedback on this issue),
> as the implementation is actually just a mechanism for identifying phrases
> statistically from a string and is NOT a Solr query parser.
> h2. *Example usage:*
> h3. (See contrib readme or configuration files in the patch for full
> configuration details)
> h3. *{{Request:}}*
> {code:java}
> http://localhost:8983/solr/spi/parse?q=darth vader obi wan kenobi anakin
> skywalker toad x men magneto professor xavier{code}
> h3. *{{Response:}}*
> {code:java}
> {
> "responseHeader":{
> "status":0,
> "QTime":25},
> "top_parsed_query":"{darth vader} {obi wan kenobi} {anakin skywalker}
> {toad} {x men} {magneto} {professor xavier}",
> "top_parsed_phrases":[
> "darth vader",
> "obi wan kenobi",
> "anakin skywalker",
> "toad",
> "x-men",
> "magneto",
> "professor xavier"],
> "potential_parsings":[{
> "parsed_phrases":["darth vader",
> "obi wan kenobi",
> "anakin skywalker",
> "toad",
> "x-men",
> "magneto",
> "professor xavier"],
> "parsed_query":"{darth vader} {obi wan kenobi} {anakin skywalker}
> {toad} {x-men} {magneto} {professor xavier}",
> "score":0.0}]}{code}
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]