Hi Niclas,
"generalization" of the user agent "without including the versions numbers"... How will you separate Mozilla/5.0 (Browser) from Mozilla/5.0 (Googlebot)? And, going to the root of a problem... why do you use SOLR such a way? Is it search service showing different content depending on browser type (WAP vs. HTML)??? If it is, you are implementing so-called "business use case" improperly... Search Engine Results Pages (SERP) should not have dependency on User-Agent HTTP Request Header. But, raw TCP output may depend on it, and it is not SOLR/Lucene layer; it is upper layer... Tomcat Servlet Container, for instance, may generate different output depending whether it is mobile device (WAP) or browser (Mozilla compatible)... I don't know your use case specifics... as Ted mentioned, it's much better to post SOLR-specific questions in solr-u...@lucene.apache.org... -Fuad > -----Original Message----- > From: Niclas Rothman [mailto:n...@lechill.com] > Sent: February-05-10 6:12 PM > To: gene...@lucene.apache.org > Cc: java-user@lucene.apache.org > Subject: RE: Wildcard searches???? > > Hi Fuad and thanks for your reply! > > The first post I know now was a wrong approach, I should not have the > wildcard included in my index. > > However, I can't do as you suggest, to have the full user agent in the > index, that’s the whole idea actually. > > The reason can be explained like this, device manufactures are literally > spitting out new devices and updates all the time which generates new > user agents that are very similar, perhaps only a small version number > differs. > So what I need is to have a "generalization" of the user agent in my > index, to only have the start of the useragent without including the > versions numbers. > This way my index are all the time "up to date" even if users with new > version numbers access my search service, which in my app isn’t > significant but instead causing my problems.... > > Example: > > I have 2 Indexed documents where the documents useragent field are > partial: > <doc> > <id>1</id> > <useragents> > Firefox > Mozilla/4.0+SonyEricsson > </useragents> > </doc> > <doc> > <id>2</id> > <useragents> > Firefox > Mozilla/4.0+SonyEricsson > </useragents> > </doc> > > User A searches my app with an user agent as: > > Mozilla/4.0+SonyEricssonC905v/R1DE+Browser/NetFront/3.4+Profile/MI > DP-2.1+Configuration/CLDC-1.1+JavaPlatform/JP8.4.1+UP.Link/6.3.1.20.0 > > The search app will display both document 1 and 2, because his user > agent starts exactly has the user agent pattern in my document. > > > User B searches my app with an user agent as (Please note that this user > agent differs in the near end from Users A (JP9.5.1 instead of > JP8.4.1)): > > Mozilla/4.0+SonyEricssonC905v/R1DE+Browser/NetFront/3.4+Profile/MI > DP-2.1+Configuration/CLDC-1.1+JavaPlatform/JP9.5.1+UP.Link/6.3.1.20.0 > > The search app will also display both document 1 and 2, because his user > agent starts exactly has the user agent pattern in my document. > Even if the version number of the java platform differs between user A > and B. > > If we now have a different index with FULL user agents, only User A > would have documents returned, none of the documents user agents matched > Users B user agent because of the "silly" version number!! > > <doc> > <id>1</id> > <useragents> > Firefox > > Mozilla/4.0+SonyEricssonC905v/R1DE+Browser/NetFront/3.4+Profile/MIDP- > 2.1+Configuration/CLDC-1.1+JavaPlatform/JP8.4.1+UP.Link/6.3.1.20.0 > </useragents> > </doc> > <doc> > <id>2</id> > <useragents> > Firefox > > Mozilla/4.0+SonyEricssonC905v/R1DE+Browser/NetFront/3.4+Profile/MIDP- > 2.1+Configuration/CLDC-1.1+JavaPlatform/JP8.4.1+UP.Link/6.3.1.20.0 > </useragents> > </doc> > > Can you see my problem? > So the basic thing is if I somehow can do a query saying that at match > should take place if a document useragent starts with the value of the > users useragent. > > In theory, having a startsWith "function / locig are easy enough to > implement in C# / T-SQL, but how on earth should I do this in SolR / > Lucene????? > > Regards > > Niclas > > > > > > > > > > > > > > > -----Original Message----- > From: Fuad Efendi [mailto:f...@efendi.ca] > Sent: 05 February 2010 22:49 > To: gene...@lucene.apache.org > Cc: java-user@lucene.apache.org > Subject: RE: Wildcard searches???? > > Niclas, > > I looked at your initial post, you are creating document with field > "abc*" > - nothing related to "wildcard query"! > > Of course, query [useragents:abcdefghijklm] will return no results, and > [q=useragents:abc] no results, but [q=useragents:abc*] will return > something. > > text_nav is specific SOLR type for _leading_ wildcard queries; you don't > need it (you don't need _leading_ wildcard queries). > > On indexing time, instead of > <doc> > <useragents> > Firefox* > Mozilla/4.0* > </useragents> > </doc> > > > You should index > <doc> > <useragents> > Mozilla/4.0+SonyEricssonC905v/R1DE+Browser/NetFront/3.4+Profile/MI > DP-2.1+Configuration/CLDC-1.1+JavaPlatform/JP8.4.1+UP.Link/6.3.1.20.0 > </useragents> > </doc> > > And also, you need to choose properly SOLR type; for instance, textTight > or textgen, or even non-tokenized string! > > > And, query [q=useragents:moz*] will return this document (even if this > field is nontokenized). > > > -Fuad > > > P.S. Don't use * when you create Lucene document; use it as part of > query. > > > > > > -----Original Message----- > > From: Niclas Rothman [mailto:n...@lechill.com] > > Sent: February-05-10 4:44 PM > > To: gene...@lucene.apache.org > > Cc: java-user@lucene.apache.org > > Subject: RE: Wildcard searches???? > > > > Ted im using SOLR, but I cant figure out what type of fieldtype I > should > > use to get a query like this to work: > > > > > > q=useragents: abcdefghijklm > > > > > > where I have in my index one document with value "abc" in field > > "useragents" > > > > That query results in 0 hits. > > > > If I issue this I get 1 hit of course (exact mathch) > > > > q=useragents: Mozilla > > > > > > My document definition in SOLR looks like: > > > > <fields> > > <field name="id" type="tint" indexed="true" stored="true" > > required="true" /> > > <field name="useragents" type="text_rev" indexed="true" > > stored="true" required="false" multiValued="true" /> > > </fields> > > > > Any clue? > > > > Nic > > > > > > > > > > -----Original Message----- > > From: Ted Dunning [mailto:ted.dunn...@gmail.com] > > Sent: 05 February 2010 21:18 > > To: gene...@lucene.apache.org > > Cc: java-user@lucene.apache.org > > Subject: Re: Wildcard searches???? > > > > This is quite close. You will have to break down the user agent that > is > > your query into the same kinds of pieces as you did for your index. > > Lucene > > will only do exact matching of terms during searching (wildcard > queries > > are > > handled by exploding the term into all possible variants). > > > > Regarding the field type, you will probably have to customize that a > > fair > > bit to make +'s be separators and such. If you use SOLR to index and > > query > > your data, then it will make sure that your separation into tokens is > > compatible unless you are using shortened forms like you mention here. > > > > On Fri, Feb 5, 2010 at 12:03 PM, Niclas Rothman <n...@lechill.com> > > wrote: > > > > > Hi again Ted and many thanks for your efforts. > > > Ok, just to be sure that we fully understand each other: > > > > > > In my index I will store partial useragents without any wildcards *, > > e.g. > > > > > > Fire (for Firefox) > > > Inte (Internet Explorer) > > > Moz (Mozill) > > > > > > > > > When I during runtime search my index for Media objects that are > > compatible > > > with a useragent, > > > e.g: > > > > > > > > > > > "Mozilla/4.0+SonyEricssonC905v/R1DE+Browser/NetFront/3.4+Profile/MIDP- > > 2.1+Configuration/CLDC-1.1+JavaPlatform/JP-8.4.1+UP.Link/6.3.1.20.0" > > > > > > Hopefully lucene / solr will serve me with all Media objects that > > partially > > > math my full user agent string and also perhaps some mismatches. To > be > > > absolutely sure that I only show Media objects that are compatible, > I > > will > > > have to loop through the resultset in my program to do a final test > > and > > > exclude any mismatches. > > > > > > Is this what you are saying Ted, that I cant do the whole process in > > Solr / > > > Lucene, that I need to do the final test in my program (C#)? > > > > > > Also, Im using Solr 1.4, what fieldtype would you recommend to use > for > > the > > > useragent ( tokenized) > > > > > > Okey, lets see what you have to say about this. > > > Please bear with me, im all new to lucene and solr!! > > > > > > Regards > > > Niclas > > > > > > > > > > > > > > > -----Original Message----- > > > From: Ted Dunning [mailto:ted.dunn...@gmail.com] > > > Sent: 05 February 2010 20:43 > > > To: gene...@lucene.apache.org > > > Cc: java-user@lucene.apache.org > > > Subject: Re: Wildcard searches???? > > > > > > Yes. I think you have it. > > > > > > To explain in a bit more detail, I think that you should store a > > tokenized > > > form of the user agents and should query using a tokenized form of > > your > > > user > > > agent. This will retrieve documents that have partial matches to > the > > user > > > agent of interest. Many of these matches, however, may not meet the > > > requirements of the wildcard expression in the documents. As such, > > you > > > will > > > need to look at each retrieved document to retrieve the wild > > expression > > > from > > > each one in turn to test if the original (untokenized) query > satisfies > > the > > > wildcard. > > > > > > If your wildcards are all of a positive nature as your example is, > > then > > > this > > > should work pretty well. > > > > > > On Fri, Feb 5, 2010 at 9:09 AM, Niclas Rothman <n...@lechill.com> > > wrote: > > > > > > > Hi Ted and thanks for all your efforts. > > > > Listen im a little bit lost here trying to understand what you are > > trying > > > > to tell me :-) > > > > > > > > 1. I Store my useragents in a field that is tokenized. > > > > 2. Then when I search, you are saying that I should "scan" down > the > > > matches > > > > via a SOLR function, or what? > > > > Are you referring to these functions in SOLR? > > > > > > > > http://wiki.apache.org/solr/FunctionQuery > > > > > > > > > > > > Sorry for not grasping immmediatley! > > > > > > > > Regards Niclas > > > > > > > > -----Original Message----- > > > > From: Ted Dunning [mailto:ted.dunn...@gmail.com] > > > > Sent: 05 February 2010 17:44 > > > > To: gene...@lucene.apache.org > > > > Cc: java-user@lucene.apache.org > > > > Subject: Re: Wildcard searches???? > > > > > > > > Tokenize your user agent strings, then store the tokenized form > > > separately > > > > from the wild card. At retrieval time, scan down the matches and > > apply > > > the > > > > wildcard from each document to your original query. The SOLR > > function > > > > query > > > > might be useful for this as would be a custom hit collector. > > > > > > > > On Fri, Feb 5, 2010 at 7:57 AM, Niclas Rothman <n...@lechill.com> > > wrote: > > > > > > > > > Hi there, i facing a problem and would like to ask the community > > for > > > some > > > > > help. > > > > > > > > > > In my index I store browser useragent values as "wildcarded" / > > > partial, > > > > > which should be understood that an indexed document > > > > > should only be shown to end users if his browsers useragent > > matches a > > > > > wildcared usereragent in my document. > > > > > > > > > > So what I have Is actually a "reversed" matching, the wildcards > > are in > > > my > > > > > document and NOT in my actual query. > > > > > Does anyone know if this "setup" Is possible, e.g. to execute a > > query > > > in > > > > > style with: > > > > > > > > > > useragents: > > > > > > > > > > > > > "Mozilla/4.0+SonyEricssonC905v/R1DE+Browser/NetFront/3.4+Profile/MIDP- > > 2.1+Configuration/CLDC-1.1+JavaPlatform/JP-8.4.1+UP.Link/6.3.1.20.0" > > > > > > > > > > In this example I would have a hit because Mozilla/4.0* matches > > the > > > > > useragent. > > > > > > > > > > <doc> > > > > > <useragents> > > > > > Firefox* > > > > > Mozilla/4.0* > > > > > </useragents> > > > > > </doc> > > > > > > > > > > > > > > > Regards > > > > > Niclas > > > > > > > > > > > > > > > > > > > > > -- > > > > Ted Dunning, CTO > > > > DeepDyve > > > > > > > > > > > > > > > > -- > > > Ted Dunning, CTO > > > DeepDyve > > > > > > > > > > > -- > > Ted Dunning, CTO > > DeepDyve > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org