Re: Some new SOLR features
why to restart solr ? reloading a core may be sufficient. SOLR-561 already supports this - On Thu, Sep 18, 2008 at 5:17 PM, Jason Rutherglen [EMAIL PROTECTED] wrote: Servlets is one thing. For SOLR the situation is different. There are always small changes people want to make, a new stop word, a small tweak to an analyzer. Rebooting the server for these should not be necessary. Ideally this is handled via a centralized console and deployed over the network (using RMI or XML) so that files do not need to be deployed. On Thu, Sep 18, 2008 at 7:41 AM, Mark Miller [EMAIL PROTECTED] wrote: Isnt this done in servlet containers for debugging type work? Maybe an option, but I disagree that this should drive anything in solr. It should really be turned off in production in servelet containers imo as well. This can really be such a pain in the ass on a live site...someone touches web.xml and the app server reboots*shudder*. Seen it, don't dig it. Jason Rutherglen wrote: This should be done. Great idea. On Wed, Sep 17, 2008 at 3:41 PM, Lance Norskog [EMAIL PROTECTED] wrote: My vote is for dynamically scanning a directory of configuration files. When a new one appears, or an existing file is touched, load it. When a configuration disappears, unload it. This model works very well for servlet containers. Lance -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Wednesday, September 17, 2008 11:21 AM To: solr-user@lucene.apache.org Subject: Re: Some new SOLR features On Wed, Sep 17, 2008 at 1:27 PM, Jason Rutherglen [EMAIL PROTECTED] wrote: If the configuration code is going to be rewritten then I would like to see the ability to dynamically update the configuration and schema without needing to reboot the server. Exactly. Actually, multi-core allows you to instantiate a completely new core and swap it for the old one, but it's a bit of a heavyweight approach. The key is finding the right granularity of change. My current thought is that a schema object would not be mutable, but that one could easily swap in a new schema object for an index at any time. That would allow a single request to see a stable view of the schema, while preventing having to make every aspect of the schema thread-safe. Also I would like the configuration classes to just contain data and not have so many methods that operate on the filesystem. That's the plan... completely separate the serialized and in memory representations. This way the configuration object can be serialized, and loaded by the server dynamically. It would be great for the schema to work the same way. Nothing will stop one from using java serialization for config persistence, however I am a fan of human readable for config files... so much easier to debug and support. Right now, people can cut-n-paste relevant parts of their config in email for support, or to a wiki to explain things, etc. Of course, if you are talking about being able to have custom filters or analyzers (new classes that don't even exist on the server yet), then it does start to get interesting. This intersects with deployment in general... and I'm not sure what the right answer is. What if Lucene or Solr needs an upgrade? It would be nice if that could also automatically be handled in a a large cluster... what are the options for handling that? Is there a role here for OSGi to play? It sounds like at least some of that is outside of the Solr domain. An alternative to serializing everything would be to ship a new schema along with a new jar file containing the custom components. -Yonik -- --Noble Paul
Re: Can I add custom fields to the input XML file?
If you have custom XML take a look at DataImportHandler http://wiki.apache.org/solr/DataImportHandler On Fri, Sep 19, 2008 at 12:24 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote: The format is fixed, you can't change it -- something on the Solr end needs to parse that XML and expects specific XML elements and structure, so it can't handle whatever one throws at it. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: convoyer [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Friday, September 19, 2008 1:26:06 AM Subject: Can I add custom fields to the input XML file? Hi guys. Is the XML format for inputting data, is a standard one? or can I change it. That is instead of : 3007WFP Dell Widescreen UltraSharp 3007WFP Dell, Inc. can I enter something like, 100100 BPO 1500 100200 ITES 2500 Thanks -- View this message in context: http://www.nabble.com/Can-I-add-custom-fields-to-the-input-XML-file--tp19566431p19566431.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: Delta importing issues
the if an entity is specified like entity=oneentity=two the command will be run only for those entities. absence of the parameter entity means all entities will be executed the last_index_time is another piece which must be improved It is hard to get usecases . If users can give me more usecases it would be great. One thing I have in mind is allo users to store arbitrary properties though API say context.persistProperty(key,value) and you must be able to read it back using context.getPersistedProperty(key); This would be a generic enough for users to get going thoughts. --Noble On Sat, Sep 20, 2008 at 1:52 AM, Jon Baer [EMAIL PROTECTED] wrote: Actually how does ${deltaimporter.last_index_time} know which entity Im specifically updating? I feel like Im missing something, can it work like that? Thanks. - Jon On Sep 19, 2008, at 4:14 PM, Jon Baer wrote: Question - So if I issued a dataimport?command=delta-importentity=one,two,three Would this also hit items w/o a delta-import like four,five,six, etc? Im trying to set something up and I ended up with 28k+ documents which seems more like a full import, so do I need to do something like delta-query= to say no delta? @ the moment I dont have anything defined for those since I don't need it, just wondering what the proper behavior is suppose to be? Thanks. - Jon -- --Noble Paul
Re: Error running query inside data-config.xml
just paste the fields in your schema so that we can help you better On Wed, Sep 24, 2008 at 12:33 PM, con [EMAIL PROTECTED] wrote: Hi I havnt changed the schema. For the time being i am simply following the default schema.xml inside conf directory. By error I meant no output values. But when I run http://localhost:8983/solr/dataimport?command=full-import, it shows that: response lst name=responseHeader int name=status0/int int name=QTime0/int /lst lst name=initArgs lst name=defaults str name=config/data-config.xml/str /lst /lst str name=commandfull-import/str str name=statusidle/str str name=importResponse/ lst name=statusMessages str name=Total Requests made to DataSource1/str str name=Total Rows Fetched152/str str name=Total Documents Skipped0/str str name=Full Dump Started2008-09-24 11:50:53/str str name= Indexing completed. Added/Updated: 152 documents. Deleted 0 documents. /str str name=Committed2008-09-24 11:50:54/str str name=Optimized2008-09-24 11:50:54/str str name=Time taken 0:0:1.169/str /lst str name=WARNING This response format is experimental. It is likely to change in the future. /str /response So that means, as i understand, solr is able to find the responses. Also I ran the query externally through the query browser, that also running fine. But when i go and search it through the admin UI, no responce is displayed. Is there something i missed in the configuration file or something else: Expecting your reply/suggestion thanks con -- View this message in context: http://www.nabble.com/Error-running-query-inside-data-config.xml-tp19642540p19643099.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: NullPointerException
I dunno if the problem is w/ date. are cdt and mdt date fields in the DB? On Fri, Sep 26, 2008 at 12:58 AM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: I'm not sure about why the NullPointerException is coming. Is that the whole stack trace? The mdt and cdt are date in schema.xml but the format that is in the log is wrong. Look at the DateFormatTransformer in DataImportHandler which can format strings in your database to the correct date format needed for Solr. On Thu, Sep 25, 2008 at 7:09 PM, Dinesh Gupta [EMAIL PROTECTED]wrote: Hi All, I have attached my file. I am getting exception. Please suggest me how to short-out this issue. WARNING: Error creating document : SolrInputDocumnt[{id=id(1.0)={93146}, ttl=ttl(1.0)={Majestic from Pushpams.com}, cdt=cdt(1.0)={2001-09-04 15:40:40.0}, mdt=mdt(1.0)={2008-09-23 17:47:44.0}, prc=prc(1.0)={600.00}}] java.lang.NullPointerException at org.apache.lucene.document.Document.getField(Document.java:140) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:283) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:58) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:69) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:288) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:319) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:190) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) at org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178) at org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:175) at org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:74) -- MSN Technology brings you the latest on gadgets, gizmos and the new hits in the gaming market. Try it now! http://computing.in.msn.com/ -- Regards, Shalin Shekhar Mangar. -- --Noble Paul
Re: DataImportHandler: way to merge multiple db-rows to 1 doc using transformer?
What is the basis on which you merge rows ? Then I may be able to suggest an easy way of doing that On Sun, Sep 28, 2008 at 3:17 AM, Britske [EMAIL PROTECTED] wrote: Looking at the wiki, code of DataImportHandler and it looks impressive. There's talk about ways to use Transformers to be able to create several rows (solr docs) based on a single db row. I'd like to know if it's possible to do the exact opposite: to build customer transformers that take multiple db-rows and merge it to a single solr-row/document. If so, how? Thanks, Britske -- View this message in context: http://www.nabble.com/DataImportHandler%3A-way-to-merge-multiple-db-rows-to-1-doc-using-transformer--tp19706722p19706722.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: using DataImportHandler instead of POST?
yep. DIH can slurp an xml file . http://wiki.apache.org/solr/DataImportHandler#head-13ffe3a5e6ac22f08e063ad3315f5e7dda279bd4 use a FileDataSource instead of HttpDataSource if your xml is in the the solr add xml format use the attribute useSolrAddSchema=true If your xml is really huge use stream=true --Noble On Mon, Sep 29, 2008 at 9:30 AM, Geoffrey Young [EMAIL PROTECTED] wrote: hi all :) I'm sorry I need to ask this, but after reading and re-reading the wiki I don't see a clear path... I have a well-formed xml file, suitable for POSTting to solr. that works just fine. it's very large, though, and using curl in production is so very lame. is there a very simple config that will let solr just slurp up the file via the DataImportHandler? solr already has everything it needs in schema.xml, so I don't think this would be very hard... if I fully understood the DataImportHandler :) tia --Geoff -- --Noble Paul
Re: Sample App needed
which is the sample that is not working? On Mon, Sep 29, 2008 at 6:02 PM, Dinesh Gupta [EMAIL PROTECTED] wrote: Hi all, Have some sample application instead of Solr sample. Please give me a sample application where I can made indexes from the DB. I am not able to work with sample app. It is too hard for new user to understand. Please help me otherwise i have to quit from the Solr. Please tell me how to attach my file for help to understand my problem Regards Dinesh Gupta _ Search for videos of Bollywood, Hollywood, Mollywood and every other wood, only on Live.com http://www.live.com/?scope=videoform=MICOAL -- --Noble Paul
Re: Indexing Large Files with Large DataImport: Problems
I guess it is a threading problem. I can give you a patch. you can raise a bug --Noble On Wed, Oct 1, 2008 at 2:11 AM, KyleMorrison [EMAIL PROTECTED] wrote: As a follow up: I continued tweaking the data-config.xml, and have been able to make the commit fail with as little as 3 fields in the sdc.xml, with only one multivalued field. Even more strange, some fields work and some do not. For instance, in my dc.xml: field column=Taxon xpath=/iProClassDatabase/iProClassEntry/GENERAL_INFORMATION/Taxonomy/Lineage/Taxon / . . . field column=GenPept xpath=/iProClassDatabase/iProClassEntry/GENERAL_INFORMATION/Protein_Name_and_ID/GenPept / and in the schema.xml: field name=GenPept type=text indexed=true stored=false multiValued=true / . . . field name=Taxon type=text indexed=true stored=false multiValued=true / but taxon works and genpept does not. What could possibly account for this discrepancy? Again, the error logs from the server are exactly that seen in the first post. What is going on? KyleMorrison wrote: Yes, this is the most recent version of Solr, stream=true and stopwords, lowercase and removeDuplicate being applied to all multivalued fields? Would the filters possibly be causing this? I will not use them and see what happens. Kyle Shalin Shekhar Mangar wrote: Hmm, strange. This is Solr 1.3.0, right? Do you have any transformers applied to these multi-valued fields? Do you have stream=true in the entity? On Tue, Sep 30, 2008 at 11:01 PM, KyleMorrison [EMAIL PROTECTED] wrote: I apologize for spamming this mailing list with my problems, but I'm at my wits end. I'll get right to the point. I have an xml file which is ~1GB which I wish to index. If that is successful, I will move to a larger file of closer to 20GB. However, when I run my data-config(let's call it dc.xml) over it, the import only manages to get about 27 rows, out of roughly 200K. The exact same data-config(dc.xml) works perfectly on smaller data files of the same type. This data-config is quite large, maybe 250 fields. When I run a smaller data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works perfectly. The only conclusion I can draw from this is that the data-config method just doesn't scale well. When the dc.xml fails, the server logs spit out: Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=95 Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) at org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=77 Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) at org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) at
Re: Indexing Large Files with Large DataImport: Problems
this patch is created from 1.3 (may apply on trunk also) --Noble On Wed, Oct 1, 2008 at 9:56 AM, Noble Paul നോബിള് नोब्ळ् [EMAIL PROTECTED] wrote: I guess it is a threading problem. I can give you a patch. you can raise a bug --Noble On Wed, Oct 1, 2008 at 2:11 AM, KyleMorrison [EMAIL PROTECTED] wrote: As a follow up: I continued tweaking the data-config.xml, and have been able to make the commit fail with as little as 3 fields in the sdc.xml, with only one multivalued field. Even more strange, some fields work and some do not. For instance, in my dc.xml: field column=Taxon xpath=/iProClassDatabase/iProClassEntry/GENERAL_INFORMATION/Taxonomy/Lineage/Taxon / . . . field column=GenPept xpath=/iProClassDatabase/iProClassEntry/GENERAL_INFORMATION/Protein_Name_and_ID/GenPept / and in the schema.xml: field name=GenPept type=text indexed=true stored=false multiValued=true / . . . field name=Taxon type=text indexed=true stored=false multiValued=true / but taxon works and genpept does not. What could possibly account for this discrepancy? Again, the error logs from the server are exactly that seen in the first post. What is going on? KyleMorrison wrote: Yes, this is the most recent version of Solr, stream=true and stopwords, lowercase and removeDuplicate being applied to all multivalued fields? Would the filters possibly be causing this? I will not use them and see what happens. Kyle Shalin Shekhar Mangar wrote: Hmm, strange. This is Solr 1.3.0, right? Do you have any transformers applied to these multi-valued fields? Do you have stream=true in the entity? On Tue, Sep 30, 2008 at 11:01 PM, KyleMorrison [EMAIL PROTECTED] wrote: I apologize for spamming this mailing list with my problems, but I'm at my wits end. I'll get right to the point. I have an xml file which is ~1GB which I wish to index. If that is successful, I will move to a larger file of closer to 20GB. However, when I run my data-config(let's call it dc.xml) over it, the import only manages to get about 27 rows, out of roughly 200K. The exact same data-config(dc.xml) works perfectly on smaller data files of the same type. This data-config is quite large, maybe 250 fields. When I run a smaller data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works perfectly. The only conclusion I can draw from this is that the data-config method just doesn't scale well. When the dc.xml fails, the server logs spit out: Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=95 Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) at org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=77 Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) at org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
Re: How to select one entity at a time?
The entity and the select query has no relationship The entity comes into picture when you do a dataimport eg: http://localhost:8983/solr/dataimport?command=full-importenity=user This is an indexing operation On Wed, Oct 1, 2008 at 11:26 AM, con [EMAIL PROTECTED] wrote: Hi guys, In the URL, http://localhost:8983/solr/select/?q= :bobversion=2.2start=0rows=10indent=onwt=json q=: applies to a field and not to an entity. So If I have 3 entities like: dataConfig dataSource **/ document entity name=user query=select * from USER /entity entity name=manager query=select * from MANAGERS /entity entity name=both query=select * from MANAGERS,USER where MANAGERS.userID= USER .userID /entity /document /dataConfig I cannot invoke the entity, 'user', just like the above url. i went through the possible arguments but didnt found a way to invoke an entity. Is there a way for this purpose. ragards con con wrote: Thanks Everybody. I have went through the wiki and some other docs. Actually I have a tight schedule and I have to look into various other things along with this. Currently I am looking into rebuilding solr by writing a wrapper class. I will update you with more meaningful questions soon.. thanks and regards. con Norberto Meijome-6 wrote: On Fri, 26 Sep 2008 02:35:18 -0700 (PDT) con [EMAIL PROTECTED] wrote: What you meant is correct only. Please excuse for that I am new to solr. :-( Con, have a read here : http://www.ibm.com/developerworks/java/library/j-solr1/ it helped me pick up the basics a while back. it refers to 1.2, but the core concepts are relevant to 1.3 too. b _ {Beto|Norberto|Numard} Meijome Hildebrant's Principle: If you don't know where you are going, any road will get you there. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned. -- View this message in context: http://www.nabble.com/How-to-select-one-entity-at-a-time--tp19668759p19754869.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: DIH - Full imports + ?entity=param
DIH does not know the rows created by that entity. So we do not really have any knowledge on how to delete specific rows. how about passing a deleteQuery=type:x in the request params or having a deleteByQuery on each top level entitywhich can be used when that entity is doing a full-import --Noble On Fri, Oct 3, 2008 at 4:32 AM, Jon Baer [EMAIL PROTECTED] wrote: Just curious, Currently a full-import call does a delete all even when appending an entity param ... wouldn't it be possible to pick up the param and just delete on that entity somehow? It would be nice if there was something involved w/ having an entity field name that worked w/ DIH to do some better introspection like that ... Is that something which is currently doable? Thanks. - Jon -- --Noble Paul
Re: Indexing Large Files with Large DataImport: Problems
Did you get a chance to test with the patch? did it work? On Wed, Oct 1, 2008 at 10:13 AM, Noble Paul നോബിള് नोब्ळ् [EMAIL PROTECTED] wrote: this patch is created from 1.3 (may apply on trunk also) --Noble On Wed, Oct 1, 2008 at 9:56 AM, Noble Paul നോബിള് नोब्ळ् [EMAIL PROTECTED] wrote: I guess it is a threading problem. I can give you a patch. you can raise a bug --Noble On Wed, Oct 1, 2008 at 2:11 AM, KyleMorrison [EMAIL PROTECTED] wrote: As a follow up: I continued tweaking the data-config.xml, and have been able to make the commit fail with as little as 3 fields in the sdc.xml, with only one multivalued field. Even more strange, some fields work and some do not. For instance, in my dc.xml: field column=Taxon xpath=/iProClassDatabase/iProClassEntry/GENERAL_INFORMATION/Taxonomy/Lineage/Taxon / . . . field column=GenPept xpath=/iProClassDatabase/iProClassEntry/GENERAL_INFORMATION/Protein_Name_and_ID/GenPept / and in the schema.xml: field name=GenPept type=text indexed=true stored=false multiValued=true / . . . field name=Taxon type=text indexed=true stored=false multiValued=true / but taxon works and genpept does not. What could possibly account for this discrepancy? Again, the error logs from the server are exactly that seen in the first post. What is going on? KyleMorrison wrote: Yes, this is the most recent version of Solr, stream=true and stopwords, lowercase and removeDuplicate being applied to all multivalued fields? Would the filters possibly be causing this? I will not use them and see what happens. Kyle Shalin Shekhar Mangar wrote: Hmm, strange. This is Solr 1.3.0, right? Do you have any transformers applied to these multi-valued fields? Do you have stream=true in the entity? On Tue, Sep 30, 2008 at 11:01 PM, KyleMorrison [EMAIL PROTECTED] wrote: I apologize for spamming this mailing list with my problems, but I'm at my wits end. I'll get right to the point. I have an xml file which is ~1GB which I wish to index. If that is successful, I will move to a larger file of closer to 20GB. However, when I run my data-config(let's call it dc.xml) over it, the import only manages to get about 27 rows, out of roughly 200K. The exact same data-config(dc.xml) works perfectly on smaller data files of the same type. This data-config is quite large, maybe 250 fields. When I run a smaller data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works perfectly. The only conclusion I can draw from this is that the data-config method just doesn't scale well. When the dc.xml fails, the server logs spit out: Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=95 Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) at org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=77 Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) at org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373
Re: command is still running ? delta-import?
may be you can do a thread dump of solr . It may throw some light on wht it is upto. kill -3 pid in *nix --Noble On Mon, Oct 6, 2008 at 6:25 PM, sunnyfr [EMAIL PROTECTED] wrote: Hi, A command is still running... How can I know exactly what does it do ? Cuz really , it looks like it does something, but I've cron job for delta-import every five minutes and nothing change .. last 20 minutes I would say, nothing changed at all, row fetched or request made .. And I checked log files, no exception there. It's the first time this delta import has been started after a tomcat restart, so maybe it's normal. How can I check what does it really do right now ? [EMAIL PROTECTED]:/data/solr/books/data# ps aux | grep solr tomcat55 18780 0.0 0.0 9452 1512 ?S10:45 0:00 /bin/bash /data/solr/video/bin/snapshooter root 18786 0.0 0.0 23112 1216 ?S10:45 0:00 sudo -u root /data/solr/video/bin/snapshooter root 18851 0.0 0.0 3944 620 pts/1S+ 11:17 0:00 grep solr Thanks a lot guys for all your help, /lst str name=statusbusy/str str name=importResponseA command is still running.../str − lst name=statusMessages str name=Time Elapsed0:28:38.8/str str name=Total Requests made to DataSource393/str str name=Total Rows Fetched9184/str str name=Total Documents Processed55/str str name=Total Documents Skipped0/str str name=Delta Dump started2008-10-06 10:45:01/str str name=Identifying Delta2008-10-06 10:45:01/str str name=Deltas Obtained2008-10-06 10:45:44/str str name=Building documents2008-10-06 10:45:44/str str name=Total Changed Documents8863/str /lst − Still the same, 3hours later : str name=statusbusy/str str name=importResponseA command is still running.../str − lst name=statusMessages str name=Time Elapsed4:1:51.416/str str name=Total Requests made to DataSource393/str str name=Total Rows Fetched9184/str str name=Total Documents Processed55/str str name=Total Documents Skipped0/str str name=Delta Dump started2008-10-06 10:45:01/str str name=Identifying Delta2008-10-06 10:45:01/str str name=Deltas Obtained2008-10-06 10:45:44/str str name=Building documents2008-10-06 10:45:44/str str name=Total Changed Documents8863/str /lst And my logs : Oct 6 10:45:01 solr-test /USR/SBIN/CRON[18775]: (root) CMD (/usr/bin/wget -q --output-document=/home/tot.txt http://solr-test.adm.video.com:8180/solr/video/dataimport?command=delta-importentity=b) Oct 6 10:45:01 solr-test jsvc.exec[18716]: Oct 6, 2008 10:45:01 AM org.apache.solr.core.SolrCore execute INFO: [video] webapp=/solr path=/dataimport params={command=delta-import} status=0 QTime=0 Oct 6, 2008 10:45:01 AM org.apache.solr.handler.dataimport.DataImporter doDeltaImport INFO: Starting Delta Import Oct 6 10:45:01 solr-test jsvc.exec[18716]: Oct 6, 2008 10:45:01 AM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Oct 6 10:45:01 solr-test jsvc.exec[18716]: Oct 6, 2008 10:45:01 AM org.apache.solr.handler.dataimport.DocBuilder doDelta INFO: Starting delta collection. Oct 6 10:45:01 solr-test jsvc.exec[18716]: Oct 6, 2008 10:45:01 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running ModifiedRowKey() for Entity: rel_group_ids Oct 6 10:45:01 solr-test jsvc.exec[18716]: Oct 6, 2008 10:45:01 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: rel_group_ids rows obtained : 0 Oct 6 10:45:01 solr-test jsvc.exec[18716]: Oct 6, 2008 10:45:01 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running DeletedRowKey() for Entity: rel_group_ids Oct 6 10:45:01 solr-test jsvc.exec[18716]: Oct 6, 2008 10:45:01 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: rel_group_ids rows obtained : 0 Oct 6 10:45:01 solr-test jsvc.exec[18716]: Oct 6, 2008 10:45:01 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed parentDeltaQuery for Entity: rel_group_ids Oct 6, 2008 10:45:01 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running ModifiedRowKey() for Entity: rel_playlist_ids Oct 6, 2008 10:45:01 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: rel_playlist_ids rows obtained : 0 Oct 6, 2008 10:45:01 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running DeletedRowKey() for Entity: rel_playlist_ids Oct 6, 2008 10:45:01 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: rel_playlist_ids rows obtained : 0 Oct 6, 2008 10:45:01 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed parentDeltaQuery for Entity: rel_playlist_ids Oct 6, 2008 10:45:01 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Ru Oct 6 10:45:01 solr-test jsvc.exec[18716]: fan_ids
Re: command is still running ? delta-import?
*nix is for different flavours of unix. Sorry , it is not a command I assumed that you r using a linux/unix system. If you are using windows press a pause/break on that window On Mon, Oct 6, 2008 at 11:38 PM, sunnyfr [EMAIL PROTECTED] wrote: what does that means : kill -3 pid and ? this one is a command too ? in *nix thanks for your advice, Noble Paul നോബിള് नोब्ळ् wrote: may be you can do a thread dump of solr . It may throw some light on wht it is upto. kill -3 pid in *nix --Noble On Mon, Oct 6, 2008 at 6:25 PM, sunnyfr [EMAIL PROTECTED] wrote: Hi, A command is still running... How can I know exactly what does it do ? Cuz really , it looks like it does something, but I've cron job for delta-import every five minutes and nothing change .. last 20 minutes I would say, nothing changed at all, row fetched or request made .. And I checked log files, no exception there. It's the first time this delta import has been started after a tomcat restart, so maybe it's normal. How can I check what does it really do right now ? [EMAIL PROTECTED]:/data/solr/books/data# ps aux | grep solr tomcat55 18780 0.0 0.0 9452 1512 ?S10:45 0:00 /bin/bash /data/solr/video/bin/snapshooter root 18786 0.0 0.0 23112 1216 ?S10:45 0:00 sudo -u root /data/solr/video/bin/snapshooter root 18851 0.0 0.0 3944 620 pts/1S+ 11:17 0:00 grep solr Thanks a lot guys for all your help, /lst str name=statusbusy/str str name=importResponseA command is still running.../str − lst name=statusMessages str name=Time Elapsed0:28:38.8/str str name=Total Requests made to DataSource393/str str name=Total Rows Fetched9184/str str name=Total Documents Processed55/str str name=Total Documents Skipped0/str str name=Delta Dump started2008-10-06 10:45:01/str str name=Identifying Delta2008-10-06 10:45:01/str str name=Deltas Obtained2008-10-06 10:45:44/str str name=Building documents2008-10-06 10:45:44/str str name=Total Changed Documents8863/str /lst − Still the same, 3hours later : str name=statusbusy/str str name=importResponseA command is still running.../str − lst name=statusMessages str name=Time Elapsed4:1:51.416/str str name=Total Requests made to DataSource393/str str name=Total Rows Fetched9184/str str name=Total Documents Processed55/str str name=Total Documents Skipped0/str str name=Delta Dump started2008-10-06 10:45:01/str str name=Identifying Delta2008-10-06 10:45:01/str str name=Deltas Obtained2008-10-06 10:45:44/str str name=Building documents2008-10-06 10:45:44/str str name=Total Changed Documents8863/str /lst And my logs : Oct 6 10:45:01 solr-test /USR/SBIN/CRON[18775]: (root) CMD (/usr/bin/wget -q --output-document=/home/tot.txt http://solr-test.adm.video.com:8180/solr/video/dataimport?command=delta-importentity=b) Oct 6 10:45:01 solr-test jsvc.exec[18716]: Oct 6, 2008 10:45:01 AM org.apache.solr.core.SolrCore execute INFO: [video] webapp=/solr path=/dataimport params={command=delta-import} status=0 QTime=0 Oct 6, 2008 10:45:01 AM org.apache.solr.handler.dataimport.DataImporter doDeltaImport INFO: Starting Delta Import Oct 6 10:45:01 solr-test jsvc.exec[18716]: Oct 6, 2008 10:45:01 AM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Oct 6 10:45:01 solr-test jsvc.exec[18716]: Oct 6, 2008 10:45:01 AM org.apache.solr.handler.dataimport.DocBuilder doDelta INFO: Starting delta collection. Oct 6 10:45:01 solr-test jsvc.exec[18716]: Oct 6, 2008 10:45:01 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running ModifiedRowKey() for Entity: rel_group_ids Oct 6 10:45:01 solr-test jsvc.exec[18716]: Oct 6, 2008 10:45:01 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: rel_group_ids rows obtained : 0 Oct 6 10:45:01 solr-test jsvc.exec[18716]: Oct 6, 2008 10:45:01 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running DeletedRowKey() for Entity: rel_group_ids Oct 6 10:45:01 solr-test jsvc.exec[18716]: Oct 6, 2008 10:45:01 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: rel_group_ids rows obtained : 0 Oct 6 10:45:01 solr-test jsvc.exec[18716]: Oct 6, 2008 10:45:01 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed parentDeltaQuery for Entity: rel_group_ids Oct 6, 2008 10:45:01 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running ModifiedRowKey() for Entity: rel_playlist_ids Oct 6, 2008 10:45:01 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: rel_playlist_ids rows obtained : 0 Oct 6, 2008 10:45:01 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running DeletedRowKey() for Entity: rel_playlist_ids Oct 6, 2008 10:45:01 AM
Re: command is still running ? delta-import?
Oct 6 11:36:36 solr-test jsvc.exec[18716]: Oct 6, 2008 11:36:36 AM org.apache.solr.core.SolrCore execute INFO: [video] webapp=/solr path=/dataimport params={} status=0 QTime=0 Oct 6 11:36:37 solr-test jsvc.exec[18716]: Oct 6, 2008 11:36:37 AM org.apache.solr.core.SolrCore execute INFO: [video] webapp=/solr path=/dataimport params={} status=0 QTime=0 Oct 6 11:40:01 solr-test /USR/SBIN/CRON[18885]: (root) CMD (/root/memmon.sh /root/memmon.txt) Oct 6 11:40:01 solr-test /USR/SBIN/CRON[1]: (root) CMD (/usr/bin/wget -q --output-document=/home/tot.txt http://solr-test.adm.videos.com:8180/solr/videos/dataimport?command=delta-importentity=b) Oct 6 11:40:01 solr-test jsvc.exec[18716]: Oct 6, 2008 11:40:01 AM org.apache.solr.core.SolrCore execute INFO: [video] webapp=/solr path=/dataimport params={command=delta-import} status=0 QTime=0 Oct 6 11:45:01 solr-test /USR/SBIN/CRON[18896]: (root) CMD ([ -x /usr/lib/sysstat/sa1 ] { [ -r $DEFAULT ] . $DEFAULT ; [ $ENABLED = true ] exec /usr/lib/sysstat/sa1 $SA1_OPTIONS 1 1 ; }) Oct 6 11:45:01 solr-test /USR/SBIN/CRON[18898]: (root) CMD (/usr/bin/wget -q --output-document=/home/tot.txt http://solr-test.adm.videos.com:8180/solr/videos/dataimport?command=delta-importentity=b) Oct 6 11:45:01 solr-test jsvc.exec[18716]: Oct 6, 2008 11:45:01 AM org.apache.solr.core.SolrCore execute INFO: [video] webapp=/solr path=/dataimport params={command=delta-import} status=0 QTime=0 sunnyfr wrote: Nothing changed I did that last night stop java process ps aux | grep java and kill -p every java's pid then I restart it and this morning still stuck timing : I mean timing change when I refresh page but just that, the rest stay like that and no exception.?? I've no idea lst name=statusMessages str name=Time Elapsed8:58:54.139/str str name=Total Requests made to DataSource225/str str name=Total Rows Fetched196420/str str name=Total Documents Processed31/str str name=Total Documents Skipped0/str str name=Delta Dump started2008-10-06 23:50:01/str str name=Identifying Delta2008-10-06 23:50:01/str str name=Deltas Obtained2008-10-06 23:50:49/str str name=Building documents2008-10-06 23:50:49/str str name=Total Changed Documents196352/str /lst Noble Paul നോബിള് नोब्ळ् wrote: *nix is for different flavours of unix. Sorry , it is not a command I assumed that you r using a linux/unix system. If you are using windows press a pause/break on that window On Mon, Oct 6, 2008 at 11:38 PM, sunnyfr [EMAIL PROTECTED] wrote: what does that means : kill -3 pid and ? this one is a command too ? in *nix thanks for your advice, Noble Paul നോബിള് नोब्ळ् wrote: may be you can do a thread dump of solr . It may throw some light on wht it is upto. kill -3 pid in *nix --Noble On Mon, Oct 6, 2008 at 6:25 PM, sunnyfr [EMAIL PROTECTED] wrote: Hi, A command is still running... How can I know exactly what does it do ? Cuz really , it looks like it does something, but I've cron job for delta-import every five minutes and nothing change .. last 20 minutes I would say, nothing changed at all, row fetched or request made .. And I checked log files, no exception there. It's the first time this delta import has been started after a tomcat restart, so maybe it's normal. How can I check what does it really do right now ? [EMAIL PROTECTED]:/data/solr/books/data# ps aux | grep solr tomcat55 18780 0.0 0.0 9452 1512 ?S10:45 0:00 /bin/bash /data/solr/video/bin/snapshooter root 18786 0.0 0.0 23112 1216 ?S10:45 0:00 sudo -u root /data/solr/video/bin/snapshooter root 18851 0.0 0.0 3944 620 pts/1S+ 11:17 0:00 grep solr Thanks a lot guys for all your help, /lst str name=statusbusy/str str name=importResponseA command is still running.../str − lst name=statusMessages str name=Time Elapsed0:28:38.8/str str name=Total Requests made to DataSource393/str str name=Total Rows Fetched9184/str str name=Total Documents Processed55/str str name=Total Documents Skipped0/str str name=Delta Dump started2008-10-06 10:45:01/str str name=Identifying Delta2008-10-06 10:45:01/str str name=Deltas Obtained2008-10-06 10:45:44/str str name=Building documents2008-10-06 10:45:44/str str name=Total Changed Documents8863/str /lst − Still the same, 3hours later : str name=statusbusy/str str name=importResponseA command is still running.../str − lst name=statusMessages str name=Time Elapsed4:1:51.416/str str name=Total Requests made to DataSource393/str str name=Total Rows Fetched9184/str str name=Total Documents Processed55/str str name=Total Documents Skipped0/str str name=Delta Dump started2008-10-06 10:45:01/str str name=Identifying Delta2008-10-06 10:45:01/str str name=Deltas Obtained2008-10-06 10:45:44/str str name=Building documents2008-10-06 10:45:44/str str name=Total Changed Documents8863/str /lst And my logs : Oct 6 10:45:01 solr-test /USR
Re: Mixing XPathEntityProcesor and JdbcDataSource
There is no direct way. I'll suggest a better solution use JdbcDataSource and get the xml as a clob/string write a transformer which can take in that field and extract fields out of the xml. There is a class called XPathRecordReader which is used by XPathEntityProcessor wjich you can directly use as an API. it is pretty trivial to use that If you need help on using the API look at the JUnits http://svn.apache.org/viewvc/lucene/solr/trunk/contrib/dataimporthandler/src/test/java/org/apache/solr/handler/dataimport/TestXPathRecordReader.java?revision=681182view=markup --Noble On Wed, Oct 8, 2008 at 3:14 AM, Manuel Carrasco [EMAIL PROTECTED] wrote: Hello guys Do you know a way to use XPathEntityProcessor with data comming from a XML field stored in a database? Thanks Manolo -- --Noble Paul
Re: Problem in using Unique key
uniqueKey required=falseuserID/uniqueKey I do not think there is a required attribute on uniquekey By default uniquekey is required . If you do not want to make it required remove the tag itself means no uniqueKey --Noble On Wed, Oct 8, 2008 at 1:17 PM, con [EMAIL PROTECTED] wrote: hi guys I am indexing values from an oracle db and them performing searching. Since I have to search multiple tables, that is no way related to each other, I have changed the uniquekey constraint in schema.xml to false. uniqueKey required=falseuserID/uniqueKey But when I do indexing, the values from the table that does not have the column USERID is not getting indexed, WARNING: Error creating document : SolrInputDocumnt[{rowtype=rowtype(1.0)={role} , INVOICEID=INVOICEID(1.0)[EMAIL PROTECTED], RATE=RATE(1.0)={1000}}] org.apache.solr.common.SolrException: Document [null] missing required field: USERID at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:289) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:58) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:69) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:288) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:319) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) 8 Oct, 2008 12:54:48 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call Is there something I missed or done anything wrong. thanks in advance. con -- View this message in context: http://www.nabble.com/Problem-in-using-Unique-key-tp19873980p19873980.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: DIH: inner select fails when outter entity is null/empty
do an onError=skip on the inner entity On Fri, Apr 23, 2010 at 3:56 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hello, Here is a newbie DataImportHandler question: Currently, I have entities with entities. There are some situations where a column value from the outer entity is null, and when I try to use it in the inner entity, the null just gets replaced with an empty string. That in turn causes the SQL query in the inner entity to fail. This seems like a common problem, but I couldn't find any solutions or mention in the FAQ ( http://wiki.apache.org/solr/DataImportHandlerFaq ) What is the best practice to avoid or convert null values to something safer? Would this be done via a Transformer or is there a better mechanism for this? I think the problem I'm describing is similar to what was described here: http://search-lucene.com/m/cjlhtFkG6m ... except I don't have the luxury of rewriting the SQL selects. Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Custom DIH variables
you can use custom parameters from request like , ${dataimporter.request.foo}. pass the value of foo as a request param say foo=bar On Wed, May 5, 2010 at 6:05 AM, Blargy zman...@hotmail.com wrote: Can someone please point me in the right direction (classes) on how to create my own custom dih variable that can be used in my data-config.xml So instead of ${dataimporter.last_index_time} I want to be able to create ${dataimporter.foo} Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-DIH-variables-tp777696p777696.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Custom DIH variables
ok , u can't write a variable. But you may write a function (Evaluator). it will look something like ${dataimporter.functions.foo()} http://wiki.apache.org/solr/DataImportHandler#Custom_formatting_in_query_and_url_using_Functions On Wed, May 5, 2010 at 9:12 PM, Blargy zman...@hotmail.com wrote: Thanks Paul, that will certainly work. I was just hoping there was a way I could write my own class that would inject this value as needed instead of precomputing this value and then passing it along in the params. My specific use case is instead of using dataimporter.last_index_time I want to use something like dataimporter.updated_time_of_last_document. Our DIH is set up to use a bunch of slave databases and there have been problems with some documents getting lost due to replication lag. I would prefer to compute this value using a custom variable at runtime instead of passing it along via the params. Is that even possible? If not Ill have to go with your previous suggestion. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-DIH-variables-tp777696p779278.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Custom DIH EventListeners
nope. register any event listener and check for the context.currentProcess() to figure out what is the event On Thu, May 6, 2010 at 8:21 AM, Blargy zman...@hotmail.com wrote: I know one can create custom event listeners for update or query events, but is it possible to create one for any DIH event (Full-Import, Delta-Import)? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-DIH-EventListeners-tp780517p780517.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Custom DIH variables
you can use the core from this API and use EmbeddedSolrServer (part of solrj) . So the calls will be in-vm On Thu, May 6, 2010 at 6:08 AM, Blargy zman...@hotmail.com wrote: Thanks Noble this is exactly what I was looking for. What is the preferred way to query solr within these sorts of classes? Should I grab the core from the context that is being passed in? Should I be using SolrJ? Can you provide an example and/or provide some tutorials/documentation. Once again, thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-DIH-variables-tp777696p780332.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Issue with delta import (not finding data in a column)
Are u reusing the context object? It may help if u can paste the relevant part of ur code On 10 May 2010 19:03, ahammad ahmed.ham...@gmail.com wrote: I have a Solr core that retrieves data from an Oracle DB. The DB table has a few columns, one of which is a Blob that represents a PDF document. In order to retrieve the actual content of the PDF file, I wrote a Blob transformer that converts the Blob into the PDF file, and subsequently reads it using PDFBox. The blob is contained in a DB column called DOCUMENT, and the data goes into a Solr field called fileContent, which is required. This works fine when doing full imports, but it fails for delta imports. I debugged my transformer, and it appears that when it attempts to fetch the blob stored in the column, it gets nothing back (i.e. null). Because the data is essentially null, it cannot retrieve anything, and cannot store anything into Solr. As a result, the document does not get imported. I am not sure what the problem is, because this only occurs with delta imports. Here is my data-config file: dataConfig dataSource driver=oracle.jdbc.driver.OracleDriver url=address user=user password=pass/ document name=table1 entity name=TABLE1 pk=ID query=select * from TABLE1 deltaImportQuery=select * from TABLE1 where ID ='${dataimporter.delta.ID}' deltaQuery=select ID from TABLE1 where (LASTMODIFIED to_date('${dataimporter.last_index_time}', '-mm-dd HH24:MI:SS')) transformer=BlobTransformer field column=ID name=id / field column=TITLE name=title / field column=FILENAME name=filename / field column=DOCUMENT name=fileContent blob=true/ field column=LASTMODIFIED name=lastModified / /entity /document /dataConfig Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Issue-with-delta-import-not-finding-data-in-a-column-tp788993p788993.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TikaEntityProcessor on Solr 1.4?
I guess it should work because Tika Entityprocessor does not use any new 1.4 APIs On Wed, May 19, 2010 at 1:17 AM, Sixten Otto six...@sfko.com wrote: Sorry to repeat this question, but I realized that it probably belonged in its own thread: The TikaEntityProcessor class that enables DataImportHandler to process business documents was added after the release of Solr 1.4, along with some other changes (like the binary DataSources) to support it. Obviously, there hasn't been an official release of Solr since then. Has anyone tried back-porting those changes to Solr 1.4? (I do see that the question was asked last month, without any response: http://www.lucidimagination.com/search/document/5d2d25bc57c370e9) The patches for these issues don't seem all that complex or pervasive, but it's hard for me (as a Solr n00b) to tell whether this is really all that's involved: https://issues.apache.org/jira/browse/SOLR-1583 https://issues.apache.org/jira/browse/SOLR-1358 Sixten -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: TikaEntityProcessor on Solr 1.4?
just copy the dih-extras jar file from the nightly should be fine On Sat, May 22, 2010 at 3:12 AM, Sixten Otto six...@sfko.com wrote: On Fri, May 21, 2010 at 5:30 PM, Chris Harris rygu...@gmail.com wrote: Actually, rather than cherry-pick just the changes from SOLR-1358 and SOLR-1583 what I did was to merge in all DataImportHandler-related changes from between the 1.4 release up through Solr trunk r890679 (inclusive). I'm not sure if that's what would work best for you, but it's one option. I'd rather, of course, not to have to build my own. But if I'm going to dabble in the source at all, it's just a slippery slope from the former to the latter. :-) (My main hesitation in doing so would be that I'm new enough to the code that I have no idea what core changes the trunk's DIH might also depend on. And my Java's pretty rusty.) How did you arrive at your patch? Just grafting the entire trunk/solr/contrib/dataimporthandler onto 1.4's code? Or did you go through Jira/SVN looking for applicable changesets? I'll be very interested to hear how your testing goes! Sixten -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Dynamic dataConfig files in DIH
On Fri, Jun 11, 2010 at 11:13 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Is there a way to dynamically point which dataConfig file to use to import : using DIH without using the defaults hardcoded in solrconfig.xml? what do you mean by dynamically ? ... it's a query param, so you can specify the file name in the url when you issue the command. not it is not. it is not reloaded for every request. We should enhance dih to do so. But the whole data-config file can be sent as a request param and it works (this is used by the dih debug mode) -Hoss -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Solr DataConfig / DIH Question
this looks like a common problem. I guess DIH should handle this more gracefully. Instead of firing a query and failing it should not fire a query if any of the values are missing . This can b made configurable if needed On Sun, Jun 13, 2010 at 9:14 AM, Lance Norskog goks...@gmail.com wrote: This is a slow way to do this; databases are capable of doing this join and feeding the results very efficiently. The 'skipDoc' feature allows you to break out of the processing chain after the first query. It is used in the wikipedia example. http://wiki.apache.org/solr/DataImportHandler On Sat, Jun 12, 2010 at 6:37 PM, Holmes, Charles V. chol...@mitre.org wrote: I'm putting together an entity. A simplified version of the database schema is below. There is a 1-[0,1] relationship between Person and Address with address_id being the nullable foreign key. If it makes any difference, I'm using SQL Server 2005 on the backend. Person [id (pk), name, address_id (fk)] Address [id (pk), zipcode] My data config looks like the one below. This naturally fails when the address_id is null since the query ends up being select * from user.address where id = . entity name=person Query=select * from user.person entity name=address Query=select * from user.address where id = ${person.address_id} /entity /entity I've worked around it by using a config like this one. However, this makes the queries quite complex for some of my larger joins. entity name=person Query=select * from user.person entity name=address Query=select * from user.address where id = (select address_id from user.person where id = ${person.id}) /entity /entity Is there a cleaner / better way of handling these type of relationships? I've also tried to specify a default in the Solr schema, but that seems to only work after all the data is indexed which makes sense but surprised me initially. BTW, thanks for the great DIH tutorial on the wiki! Thanks! Charles -- Lance Norskog goks...@gmail.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Throttling replication
There is no way to currently throttle replication. It consumes the whole bandwidth available. It is a nice to have feature On Thu, Sep 2, 2010 at 8:11 PM, Mark static.void@gmail.com wrote: Is there any way or forthcoming patch that would allow configuration of how much network bandwith (and ultimately disk I/O) a slave is allowed during replication? We have the current problem of while replicating our disk I/O goes through the roof. I would much rather have the replication take 2x as long with half the disk I/O? Any thoughts? Thanks -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Changing masterUrl in ReplicationHandler at Runtime
it would be better to add a command to change the master in runtime. But Solr are planning to move to a zookeeper based system where this can be automatically be taken care of On Sat, Oct 10, 2009 at 6:06 AM, wojtekpia wojte...@hotmail.com wrote: Hi, I'm trying to change the masterUrl of a search slave at runtime. So far I've found 2 ways of doing it: 1. Change solrconfig_slave.xml on master, and have it replicate to solrconfig.xml on the slave 2. Change solrconfig.xml on slave, then issue a core reload command. (a side note: can I issue the reload-core command without having a solr.xml file? I had to run a single core in multi-core mode to make this work) So far I like solution 2 better. Does it make sense to add a 'sticky' parameter to the ReplicationHandler's fetchindex command? Something like: http://slave_host:port/solr/replication?command=fetchindexmasterUrl=myUrlstickyMasterUrl=true If true then 'myUrl' would continue being used for replication, including future polling. Are there other solutions? Thanks, Wojtek -- View this message in context: http://www.nabble.com/Changing-masterUrl-in-ReplicationHandler-at-Runtime-tp25829843p25829843.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Dynamic Data Import from multiple identical tables
there is another option of passing the table name as a request parameter and make your sql query templatized . example query=select * from ${table} and pass the value of table as a request parameter On Sat, Oct 10, 2009 at 3:52 AM, solr.searcher solr.searc...@gmail.com wrote: Hmmm. Interesting line of thought. Thanks a lot Jay. Will explore this approach. There are lot of duplicate tables though :). I was about to try a different approach - set up two solar cores, keep reloading config and updating one, merge with the bigger index ... But your approach is worth exploring. Thanks. Jay Hill wrote: You could use separate DIH config files for each of your three tables. This might be overkill, but it would keep them separate. The DIH is not limited to one request handler setup, so you could create a unique handler for each case with a unique name: requestHandler name=/indexer/table1 class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configtable1-config.xml/str /lst /requestHandler requestHandler name=/indexer/table2 class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configtable2-config.xml/str /lst /requestHandler requestHandler name=/indexer/table3 class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configtable3-config.xml/str /lst /requestHandler When you go to ...solr/admin/dataimport.jsp you should see a list of all DataImportHandlers that are configured, and can select them individually, if that works for your needs. -Jay http://www.lucidimagination.com On Fri, Oct 9, 2009 at 10:57 AM, solr.searcher solr.searc...@gmail.comwrote: Hi all, First of all, please accept my apologies if this has been asked and answered before. I tried my best to search and couldn't find anything on this. The problem I am trying to solve is as follows. I have multiple tables with identical schema - table_a, table_b, table_c ... and I am trying to create one big index with the data from each of these tables. The idea was to programatically create the data-config file (just changing the table name) and do a reload-config followed by a full-import with clean set to false. In other words: 1. publish the data-config file 2. do a reload-config 3. do a full-import with clean = false 4. commit, optimize 5. repeat with new table name I wanted to then follow the same procedure for delta imports. The problem is that after I do a reload-config and then do a full-import, the old data in the index is getting lost. What am I missing here? Please note that I am new to solr. INFO: [] webapp=/solr path=/dataimport params={command=reload-configclean=false} status=0 QTime=4 Oct 9, 2009 10:17:30 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=full-importclean=false} status=0 QTime=1 Oct 9, 2009 10:17:30 AM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Oct 9, 2009 10:17:30 AM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Oct 9, 2009 10:17:30 AM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Oct 9, 2009 10:17:30 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity blah blah blah Oct 9, 2009 10:17:30 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 12 Oct 9, 2009 10:17:31 AM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=1 commit{dir=/blah/blah/index,segFN=segments_1z,version=1255032607825,generation=71,filenames=[segments_1z, _cl.cfs] Oct 9, 2009 10:17:31 AM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: last commit = 1255032607825 Any help will be greatly appreciated. Is there any other way to automaticaly slurp data from multiple, identical tables? Thanks a lot. -- View this message in context: http://www.nabble.com/Dynamic-Data-Import-from-multiple-identical-tables-tp25825381p25825381.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Dynamic-Data-Import-from-multiple-identical-tables-tp25825381p25828773.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: DIH and EmbeddedSolr
I guess it should be possible... what are the problems you encounter? On Sat, Oct 10, 2009 at 10:56 AM, rohan rai hiroha...@gmail.com wrote: Have been unable to use DIH for Embedded Solr Is there a way?? Regards Rohan -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: http replication transfer speed
Did you try w/o firing queries on the slave? On Sun, Oct 11, 2009 at 6:05 AM, Mark Miller markrmil...@gmail.com wrote: On a drive that can do 40+ that's getting query load might have it's writes knocked down to that? - Mark http://www.lucidimagination.com (mobile) On Oct 10, 2009, at 6:41 PM, Mark Miller markrmil...@gmail.com wrote: Anyone know why you would see a transfer speed of just 10-20MB over a gigbit network connection? Even with standard drives, I would expect to at least see around 40MB. Has anyone seen over 10-20 using replication? Any ideas on what the bottleneck should be? I think even a standard drive can do writes of a bit of 40MB/s, and certainly reads over that. Thoughts? -- - Mark http://www.lucidimagination.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: doing searches from within an UpdateRequestProcessor
A custom UpdateRequestProcessor is the solution. You can access the searcher in a UpdateRequestProcessor. On Tue, Oct 13, 2009 at 4:20 AM, Bill Au bill.w...@gmail.com wrote: Is it possible to do searches from within an UpdateRequestProcessor? The documents in my index reference each other. When a document is deleted, I would like to update all documents containing a reference to the deleted document. My initial idea is to use a custom UpdateRequestProcessor. Is there a better way to do this? Bill -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Lucene Merge Threads
which version of Solr are you using? the int name=maxThreadCount1/int syntax was added recently On Tue, Oct 13, 2009 at 8:08 AM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: This didn't end up working. I got the following error when I tried to commit: Oct 12, 2009 8:36:42 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error loading class ' 5 ' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:310) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:325) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:81) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:178) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:123) at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:172) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:400) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:168) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.ClassNotFoundException: 5 at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.$$YJP$$doPrivileged(Native Method) at java.security.AccessController.doPrivileged(Unknown Source) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.net.FactoryURLClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClassInternal(Unknown Source) at java.lang.Class.$$YJP$$forName0(Native Method) at java.lang.Class.forName0(Unknown Source) at java.lang.Class.forName(Unknown Source) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:294) ... 28 more I believe it's because the MaxThreadCount is not a public property of the ConcurrentMergeSchedulerClass. You have to call this method to set it: public void setMaxThreadCount(int count) { if (count 1) throw new IllegalArgumentException(count should be at least 1); maxThreadCount = count; } Is that possible through the solrconfig? Thanks, Gio. -Original Message- From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com] Sent: Monday, October 12, 2009 7:53 PM To: solr-user@lucene.apache.org Subject: RE: Lucene Merge Threads Do you have to make a new call to optimize to make it start the merge again? -Original Message- From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] Sent: Monday, October 12, 2009 7:28 PM To: solr-user@lucene.apache.org Subject: Re: Lucene Merge Threads Try this in solrconfig.xml: mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler int
Re: Adding callback url to data import handler...Is this possible?
I can understand the concern that you do not wish to write Java code . But a callback url is a very specific requirement. We plan to extend javascript support to the EventListener callback . Will it help? On Wed, Oct 14, 2009 at 11:47 PM, Avlesh Singh avl...@gmail.com wrote: Hmmm ... I think this is a valid use case and it might be a good idea to support it in someway. I will post this thread on the dev-mailing list to seek opinion. Cheers Avlesh On Wed, Oct 14, 2009 at 11:39 PM, William Pierce evalsi...@hotmail.comwrote: Thanks, Avlesh. Yes, I did take a look at the event listeners. As I mentioned this would require us to write Java code. Our app(s) are entirely windows/asp.net/C# so while we could add Java in a pinch, we'd prefer to stick to using SOLR using its convenient REST-style interfaces which makes no demand on our app environment. Thanks again for your suggestion! Cheers, Bill -- From: Avlesh Singh avl...@gmail.com Sent: Wednesday, October 14, 2009 10:59 AM To: solr-user@lucene.apache.org Subject: Re: Adding callback url to data import handler...Is this possible? Had a look at EventListeners in DIH?http://wiki.apache.org/solr/DataImportHandler#EventListeners Cheers Avlesh On Wed, Oct 14, 2009 at 11:21 PM, William Pierce evalsi...@hotmail.com wrote: Folks: I am pretty happy with DIH -- it seems to work very well for my situation. Thanks!!! The one issue I see has to do with the fact that I need to keep polling url/dataimport to check if the data import completed successfully. I need to know when/if the import is completed (successfully or otherwise) so that I can update appropriate structures in our app. What I would like is something like what Google Checkout API offers -- a callback URL. That is, I should be able to pass along a URL to DIH. Once it has completed the import, it can invoke the provided URL. This provides a callback mechanism for those of us who don't have the liberty to change SOLR source code. We can then do the needful upon receiving this callback. If this functionality is already provided in some form/fashion, I'd love to know. All in all, great functionality that has significantly helped me out! Cheers, - Bill -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Adding callback url to data import handler...Is this possible?
It is not yet implemented .You may open an issue for the same --Noble On Thu, Oct 15, 2009 at 12:14 PM, William Pierce evalsi...@hotmail.com wrote: If the JavaScript support enables me to invoke a URL, it's really OK with me. Cheers, - Bill -- From: Avlesh Singh avl...@gmail.com Sent: Wednesday, October 14, 2009 11:01 PM To: solr-user@lucene.apache.org Subject: Re: Adding callback url to data import handler...Is this possible? But a callback url is a very specific requirement. We plan to extend javascript support to the EventListener callback. I would say the latter is more specific than the former. People who are comfortable writing JAVA wouldn't need any of these but the second best thing for others would be a capability to handle it in their own applications. A url can be the simplest way to invoke things in respective application. Doing it via javascript sounds like a round-about way of doing it. The eventhandler Cheers Avlesh 2009/10/15 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com I can understand the concern that you do not wish to write Java code . But a callback url is a very specific requirement. We plan to extend javascript support to the EventListener callback . Will it help? On Wed, Oct 14, 2009 at 11:47 PM, Avlesh Singh avl...@gmail.com wrote: Hmmm ... I think this is a valid use case and it might be a good idea to support it in someway. I will post this thread on the dev-mailing list to seek opinion. Cheers Avlesh On Wed, Oct 14, 2009 at 11:39 PM, William Pierce evalsi...@hotmail.com wrote: Thanks, Avlesh. Yes, I did take a look at the event listeners. As I mentioned this would require us to write Java code. Our app(s) are entirely windows/asp.net/C# so while we could add Java in a pinch, we'd prefer to stick to using SOLR using its convenient REST-style interfaces which makes no demand on our app environment. Thanks again for your suggestion! Cheers, Bill -- From: Avlesh Singh avl...@gmail.com Sent: Wednesday, October 14, 2009 10:59 AM To: solr-user@lucene.apache.org Subject: Re: Adding callback url to data import handler...Is this possible? Had a look at EventListeners in DIH?http://wiki.apache.org/solr/DataImportHandler#EventListeners Cheers Avlesh On Wed, Oct 14, 2009 at 11:21 PM, William Pierce evalsi...@hotmail.com wrote: Folks: I am pretty happy with DIH -- it seems to work very well for my situation. Thanks!!! The one issue I see has to do with the fact that I need to keep polling url/dataimport to check if the data import completed successfully. I need to know when/if the import is completed (successfully or otherwise) so that I can update appropriate structures in our app. What I would like is something like what Google Checkout API offers -- a callback URL. That is, I should be able to pass along a URL to DIH. Once it has completed the import, it can invoke the provided URL. This provides a callback mechanism for those of us who don't have the liberty to change SOLR source code. We can then do the needful upon receiving this callback. If this functionality is already provided in some form/fashion, I'd love to know. All in all, great functionality that has significantly helped me out! Cheers, - Bill -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Using DIH's special commands....Help needed
use LogTransformer to see if the value is indeed set entity name=post transformer=script:DeleteRow, RegexTransformer,LogTransformer logTemplate=${post} query= select Id, a, b, c, IndexingStatus from prod_table where (IndexingStatus = 1 or IndexingStatus = 4) this should print out the entire row after the transformations On Fri, Oct 16, 2009 at 3:04 AM, William Pierce evalsi...@hotmail.com wrote: Thanks for your reply! I tried your suggestion. No luck. I have verified that I have version 1.6.0_05-b13 of java installed. I am running with the nightly bits of October 7. I am pretty much out of ideas at the present timeI'd appreciate any tips/pointers. Thanks, - Bill -- From: Shalin Shekhar Mangar shalinman...@gmail.com Sent: Thursday, October 15, 2009 1:42 PM To: solr-user@lucene.apache.org Subject: Re: Using DIH's special commandsHelp needed On Fri, Oct 16, 2009 at 12:46 AM, William Pierce evalsi...@hotmail.comwrote: Thanks for your help. Here is my DIH config fileI'd appreciate any help/pointers you may give me. No matter what I do the documents are not getting deleted from the index. My db has rows whose 'IndexingStatus' field has values of either 1 (which means add it to solr), or 4 (which means delete the document with the primary key from SOLR index). I have two transformers running. Not sure what I am doing wrong. dataConfig script![CDATA[ function DeleteRow(row) { var jis = row.get('IndexingStatus'); var jid = row.get('Id'); if ( jis == 4 ) { row.put('$deleteDocById', jid); } return row; } ]]/script dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/db user=** password=***/ document entity name=post transformer=script:DeleteRow, RegexTransformer query= select Id, a, b, c, IndexingStatus from prod_table where (IndexingStatus = 1 or IndexingStatus = 4) field column=ptype splitBy=, sourceColName=a / field column=wauth splitBy=, sourceColName=b / field column=miles splitBy=, sourceColName=c / /entity /document /dataConfig One thing I'd try is to use '4' for comparison rather than the number 4 (the type would depend on the sql type). Also, for javascript transformers to work, you must use JDK 6 which has javascript support. Rest looks fine to me. -- Regards, Shalin Shekhar Mangar. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
DIH wiki page reverted
I have reverted the DIH wiki page to revision 212. see this https://issues.apache.org/jira/browse/INFRA-2270 the wiki has not sent any mail yet So all the changes which were made after 212 is lost. Please go through the page and check if your changes are lost. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Using DIH's special commands....Help needed
It is strange that LogTransformer did not log the data. . On Fri, Oct 16, 2009 at 5:54 PM, William Pierce evalsi...@hotmail.com wrote: Folks: Continuing my saga with DIH and use of its special commands. I have verified that the script functionality is indeed working. I also verified that '$skipRow' is working. But I don't think that '$deleteDocById' is working. My script now looks as follows: script ![CDATA[ function DeleteRow(row) { var jid = row.get('Id'); var jis = row.get('IndexingStatus'); if ( jis == 4 ) { row.put('$deleteDocById', jid); row.remove('Col1'); row.put('Col1', jid); } return row; } ]] /script The theory is that rows whose 'IndexingStatus' value is 4 should be deleted from solr index. Just to be sure that javascript syntax was correct and checked out, I intentionally overwrite a field called 'Col1' in my schema with primary key of the document to be deleted. On a clean and empty index, I import 47 rows from my dummy db. Everything checks out correctly since IndexingStatus for each row is 1. There are no rows to delete. I then go into the db and set one row with the IndexingStatus = 4. When I execute the dataimport, I find that all 47 documents are imported correctly. However, for the row for which 'IndexingStatus' was set to 4, the Col1 value is set correctly by the script transformer to be the primary key value for that row/document. However, I should not be seeing that document since the '$deleteDocById should have deleted this from solr. Could this be a bug in solr? Or, am I misunderstanding how $deleteDocById works? By the way, Noble, I tried to set the LogTransformer, and add logging per your suggestion. That did not work either. I set logLevel=debug, and also turned on solr logging in the admin console to be the max value (finest) and still no output. Thanks, - Bill -- From: Noble Paul ??? ?? noble.p...@corp.aol.com Sent: Thursday, October 15, 2009 10:05 PM To: solr-user@lucene.apache.org Subject: Re: Using DIH's special commandsHelp needed use LogTransformer to see if the value is indeed set entity name=post transformer=script:DeleteRow, RegexTransformer,LogTransformer logTemplate=${post} query= select Id, a, b, c, IndexingStatus from prod_table where (IndexingStatus = 1 or IndexingStatus = 4) this should print out the entire row after the transformations On Fri, Oct 16, 2009 at 3:04 AM, William Pierce evalsi...@hotmail.com wrote: Thanks for your reply! I tried your suggestion. No luck. I have verified that I have version 1.6.0_05-b13 of java installed. I am running with the nightly bits of October 7. I am pretty much out of ideas at the present timeI'd appreciate any tips/pointers. Thanks, - Bill -- From: Shalin Shekhar Mangar shalinman...@gmail.com Sent: Thursday, October 15, 2009 1:42 PM To: solr-user@lucene.apache.org Subject: Re: Using DIH's special commandsHelp needed On Fri, Oct 16, 2009 at 12:46 AM, William Pierce evalsi...@hotmail.comwrote: Thanks for your help. Here is my DIH config fileI'd appreciate any help/pointers you may give me. No matter what I do the documents are not getting deleted from the index. My db has rows whose 'IndexingStatus' field has values of either 1 (which means add it to solr), or 4 (which means delete the document with the primary key from SOLR index). I have two transformers running. Not sure what I am doing wrong. dataConfig script![CDATA[ function DeleteRow(row) { var jis = row.get('IndexingStatus'); var jid = row.get('Id'); if ( jis == 4 ) { row.put('$deleteDocById', jid); } return row; } ]]/script dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/db user=** password=***/ document entity name=post transformer=script:DeleteRow, RegexTransformer query= select Id, a, b, c, IndexingStatus from prod_table where (IndexingStatus = 1 or IndexingStatus = 4) field column=ptype splitBy=, sourceColName=a / field column=wauth splitBy=, sourceColName=b / field column=miles splitBy=, sourceColName=c / /entity /document /dataConfig One thing I'd try is to use '4' for comparison rather than the number 4 (the type would depend on the sql type). Also, for javascript transformers to work, you must use JDK 6 which has javascript support.
Re: Using DIH's special commands....Help needed
postImportDeletQuery is fine in your case. On Sat, Oct 17, 2009 at 3:16 AM, William Pierce evalsi...@hotmail.com wrote: Shalin, Many thanks for your tipBut it did not seem to help! Do you think I can use postDeleteImportQuery for this task? Should I file a bug report? Cheers, Bill -- From: Shalin Shekhar Mangar shalinman...@gmail.com Sent: Friday, October 16, 2009 1:16 PM To: solr-user@lucene.apache.org Subject: Re: Using DIH's special commandsHelp needed On Fri, Oct 16, 2009 at 5:54 PM, William Pierce evalsi...@hotmail.comwrote: Folks: Continuing my saga with DIH and use of its special commands. I have verified that the script functionality is indeed working. I also verified that '$skipRow' is working. But I don't think that '$deleteDocById' is working. My script now looks as follows: script ![CDATA[ function DeleteRow(row) { var jid = row.get('Id'); var jis = row.get('IndexingStatus'); if ( jis == 4 ) { row.put('$deleteDocById', jid); row.remove('Col1'); row.put('Col1', jid); } return row; } ]] /script The theory is that rows whose 'IndexingStatus' value is 4 should be deleted from solr index. Just to be sure that javascript syntax was correct and checked out, I intentionally overwrite a field called 'Col1' in my schema with primary key of the document to be deleted. On a clean and empty index, I import 47 rows from my dummy db. Everything checks out correctly since IndexingStatus for each row is 1. There are no rows to delete. I then go into the db and set one row with the IndexingStatus = 4. When I execute the dataimport, I find that all 47 documents are imported correctly. However, for the row for which 'IndexingStatus' was set to 4, the Col1 value is set correctly by the script transformer to be the primary key value for that row/document. However, I should not be seeing that document since the '$deleteDocById should have deleted this from solr. Could this be a bug in solr? Or, am I misunderstanding how $deleteDocById works? Would the row which has IndexingStatus=4 also create a document with the same uniqueKey which you would delete using the transformer? If yes, that can explain what is happening and you can avoid that by adding a $skipDoc flag in addition to the $deleteDocById flag. I know this is a basic question but you are using Solr 1.4, aren't you? -- Regards, Shalin Shekhar Mangar. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Using DIH's special commands....Help needed
The accepted logLevel values are error, deubug,warn,trace,info 2009/10/18 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: On Sun, Oct 18, 2009 at 4:16 AM, Lance Norskog goks...@gmail.com wrote: I had this problem also, but I was using the Jetty exampl. I fail at logging configurations about 90% of the time, so I assumed it was my fault. did you set the logLevel atribute also in the entity? if you set logLevel=severe it should definitely be printed 2009/10/17 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: It is strange that LogTransformer did not log the data. . On Fri, Oct 16, 2009 at 5:54 PM, William Pierce evalsi...@hotmail.com wrote: Folks: Continuing my saga with DIH and use of its special commands. I have verified that the script functionality is indeed working. I also verified that '$skipRow' is working. But I don't think that '$deleteDocById' is working. My script now looks as follows: script ![CDATA[ function DeleteRow(row) { var jid = row.get('Id'); var jis = row.get('IndexingStatus'); if ( jis == 4 ) { row.put('$deleteDocById', jid); row.remove('Col1'); row.put('Col1', jid); } return row; } ]] /script The theory is that rows whose 'IndexingStatus' value is 4 should be deleted from solr index. Just to be sure that javascript syntax was correct and checked out, I intentionally overwrite a field called 'Col1' in my schema with primary key of the document to be deleted. On a clean and empty index, I import 47 rows from my dummy db. Everything checks out correctly since IndexingStatus for each row is 1. There are no rows to delete. I then go into the db and set one row with the IndexingStatus = 4. When I execute the dataimport, I find that all 47 documents are imported correctly. However, for the row for which 'IndexingStatus' was set to 4, the Col1 value is set correctly by the script transformer to be the primary key value for that row/document. However, I should not be seeing that document since the '$deleteDocById should have deleted this from solr. Could this be a bug in solr? Or, am I misunderstanding how $deleteDocById works? By the way, Noble, I tried to set the LogTransformer, and add logging per your suggestion. That did not work either. I set logLevel=debug, and also turned on solr logging in the admin console to be the max value (finest) and still no output. Thanks, - Bill -- From: Noble Paul ??? ?? noble.p...@corp.aol.com Sent: Thursday, October 15, 2009 10:05 PM To: solr-user@lucene.apache.org Subject: Re: Using DIH's special commandsHelp needed use LogTransformer to see if the value is indeed set entity name=post transformer=script:DeleteRow, RegexTransformer,LogTransformer logTemplate=${post} query= select Id, a, b, c, IndexingStatus from prod_table where (IndexingStatus = 1 or IndexingStatus = 4) this should print out the entire row after the transformations On Fri, Oct 16, 2009 at 3:04 AM, William Pierce evalsi...@hotmail.com wrote: Thanks for your reply! I tried your suggestion. No luck. I have verified that I have version 1.6.0_05-b13 of java installed. I am running with the nightly bits of October 7. I am pretty much out of ideas at the present timeI'd appreciate any tips/pointers. Thanks, - Bill -- From: Shalin Shekhar Mangar shalinman...@gmail.com Sent: Thursday, October 15, 2009 1:42 PM To: solr-user@lucene.apache.org Subject: Re: Using DIH's special commandsHelp needed On Fri, Oct 16, 2009 at 12:46 AM, William Pierce evalsi...@hotmail.comwrote: Thanks for your help. Here is my DIH config fileI'd appreciate any help/pointers you may give me. No matter what I do the documents are not getting deleted from the index. My db has rows whose 'IndexingStatus' field has values of either 1 (which means add it to solr), or 4 (which means delete the document with the primary key from SOLR index). I have two transformers running. Not sure what I am doing wrong. dataConfig script![CDATA[ function DeleteRow(row) { var jis = row.get('IndexingStatus'); var jid = row.get('Id'); if ( jis == 4 ) { row.put('$deleteDocById', jid); } return row; } ]]/script dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/db user=** password=***/ document entity name=post transformer=script:DeleteRow
Re: [DIH] URLDataSource and fetching a link
entity name=nytSportsFeed pk=link url=http://feeds1.nytimes.com/nyt/rss/Sports processor=XPathEntityProcessor forEach=/rss/channel | /rss/channel/item dataSource=rss transformer=RegexTransformer,DateFormatTransformer field column=source xpath=/rss/channel/title commonField=true / field column=source-link xpath=/rss/channel/link commonField=true / field column=title xpath=/rss/channel/item/title / field column=id xpath=/rss/channel/item/guid / field column=link xpath=/rss/channel/item/link / !-- Use the RegexTransformer to strip out ads -- field column=description xpath=/rss/channel/item/description regex=lt;a.*?lt;/agt; replaceWith=/ field column=category xpath=/rss/channel/item/category / !-- 'Sun, 18 May 2008 11:23:11 +' -- field column=pubDate xpath=/rss/channel/item/pubDate dateTimeFormat=EEE, dd MMM HH:mm:ss Z / entity name=x url=${nytSportsFeed.link} processor=PlainTextEntityProcessor dataSource=rss transformer=HTMLStripTransformer field column=plainText name=body stripHTML=true/ /entity /entity On Tue, Oct 20, 2009 at 6:13 PM, Grant Ingersoll gsing...@apache.orgwrote: Finally getting back to this... On Sep 17, 2009, at 12:28 AM, Noble Paul നോബിള് नोब्ळ् wrote: 2009/9/17 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: it is possible to have a sub entity which has XPathEntityProcessor which can use the link ar the url This may not be a good solution. But you can use the $hasMore and $nextUrl options of XPathEntityProcessor to recursively loop if there are more links Is there an example of this somewhere? The DIH Wiki refers to it, but I don't see an example of it. I have: entity name=nytSportsFeed pk=link url= http://feeds1.nytimes.com/nyt/rss/Sports; processor=XPathEntityProcessor forEach=/rss/channel | /rss/channel/item dataSource=rss transformer=RegexTransformer,DateFormatTransformer field column=source xpath=/rss/channel/title commonField=true / field column=source-link xpath=/rss/channel/link commonField=true / field column=title xpath=/rss/channel/item/title / field column=id xpath=/rss/channel/item/guid / field column=link xpath=/rss/channel/item/link / !-- Use the RegexTransformer to strip out ads -- field column=description xpath=/rss/channel/item/description regex=lt;a.*?lt;/agt; replaceWith=/ field column=category xpath=/rss/channel/item/category / !-- 'Sun, 18 May 2008 11:23:11 +' -- field column=pubDate xpath=/rss/channel/item/pubDate dateTimeFormat=EEE, dd MMM HH:mm:ss Z / /entity And I want to take the value from the link column and go get the contents of that link and index them into a body field. I'm not sure how to link in the sub-entity. Thanks, Grant On Thu, Sep 17, 2009 at 8:57 AM, Grant Ingersoll gsing...@apache.org wrote: Many RSS feeds contain a link to some full article. How can I have the DIH get the RSS feed and then have it go and fetch the content at the link? Thanks, Grant -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Disable replication on master while slaves are pulling
On Wed, Oct 21, 2009 at 2:31 PM, Jérôme Etévé jerome.et...@gmail.com wrote: Hi there, I'm planning to reindex all my data on my master server everyday, so here's what I intend to do on the master: 1 - disable replication on the master 2 - Empty the index 3 - Reindex everything 4 - Optimize 5 - enable replication again There's something I'm wondering about this strategy. What would happen if a slave is not finished pulling the data when I start step 1? All going replications will be completed even after you disable replication if your are planing to delete all docs using Solr itself no problem Solr takes care of it automatically. If you plan to do delete the files directly give some time after you disable replication . Then you may clean up your index Is there a better strategy to achieve daily complete reindexing? Thanks! Jerome. -- Jerome Eteve. http://www.eteve.net jer...@eteve.net -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: is EmbeddedSolrServer thread safe ?
yes On Thu, Oct 22, 2009 at 2:38 PM, jfmel...@free.fr wrote: at SolrJ wiki page : http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer CommonsHttpSolrServer is thread-safe and if you are using the following constructor, you *MUST* re-use the same instance for all requests. ... But is it the same for EmbeddedSolrServer ? Best regards Jean-François -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: multicore query via solrJ
u guessed it right . Solrj cannot query on multiple cores 2009/10/23 Licinio Fernández Maurelo licinio.fernan...@gmail.com: As no answer is given, I assume it's not possible. It will be great to code a method like this query(SolrServer, ListSolrServer) El 20 de octubre de 2009 11:21, Licinio Fernández Maurelo licinio.fernan...@gmail.com escribió: Hi there, is there any way to perform a multi-core query using solrj? P.S.: I know about this syntax: http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1q= but i'm looking for a more fancy way to do this using solrj (something like shards(query) ) thx -- Lici -- Lici -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: SolrJ and Json
CommonsHttpSolrServer will overwrite the wt param depending on the responseParser set.There are only two response parsers. javabin and xml. The qresponse.toString() actually is a String reperesentation of a namedList object . it has nothing to do with JSON On Fri, Oct 23, 2009 at 2:11 PM, SGE0 stefangee...@hotmail.com wrote: Hi , I have following problem: Using CommonsHttpSolrServer (javabin format) I do a query with wt=json and get following response (by using qresponse = solr.query(params); and then qresponse.toString(); {responseHeader={status=0,QTime=16,params={indent=on,start=0,q=mmm,qt=dismax,wt=[javabin, javabin],hl=on,rows=10,version=[1, 1]}},response={numFound=0,start=0,docs=[]},highlighting={}} Now this does not seems to be JSON format (or is it ) ? Should the equal sign not be a ':' and the values surrounded with double quotes ? The problem is that I want to pass the qresponse to a Javascript variable so the client javascript code can then inspect the JSON response and do whatever is needed. What I did was: var str = %=qresponse.toString()%; but I can't seem to correctly read the str variable as a JSON object and parse it (on the client side). Any ideas or code snippets to show the correct way ? Regards, St. -- View this message in context: http://www.nabble.com/SolrJ-and-Json-tp26022705p26022705.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: CAS client configuration with MT4-PHP.
Is it a query related to Solr ? On Fri, Oct 23, 2009 at 6:46 PM, Radha C. cra...@ceiindia.com wrote: Hi, We have CAS server of spring integrated and it is running in apache. We have application in MovableType4 - PHP. Is it possible to configure the MT4 authentication module to redirect to external CAS server when the application recieves login request? It would be helpful if there is any document available for this. Thanks in advance. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Solrj client API and response in XML format (Solr 1.4)
no need to use httpclient . use java.net.URL#openConnection(url) and read the inputstream into a buffer and that is it. On Sat, Oct 24, 2009 at 1:53 PM, SGE0 stefangee...@hotmail.com wrote: Hi Paul, thx again. Can I use this technique from within a servlet ? Do I need an instance of the HttpClient to do that ? I noticed I can instantiate the CommonsHttpSolrServer with a HttpClient client . I did not find any relevant examples how to use this . If you can help me out with this much appreciated.. Stefan Noble Paul നോബിള് नोब्ळ्-2 wrote: hi you don't see the point . You really don't need to use SolrJ . All that you need to do is just make an http request with wt=json and read the output to a buffer and you can just send it to your client. --Noble On Fri, Oct 23, 2009 at 9:40 PM, SGE0 stefangee...@hotmail.com wrote: Hi All, After a day of searching I'm quite confused. I use the solrj client as follows: CommonsHttpSolrServer solr = new CommonsHttpSolrServer(http://127.0.0.1:8080/apache-solr-1.4-dev/test;); solr.setRequestWriter(new BinaryRequestWriter()); ModifiableSolrParams params = new ModifiableSolrParams(); params.set(qt, dismax); params.set(indent, on); params.set(version, 2.2); params.set(q, test); params.set(start, 0); params.set(rows, 10); params.set(wt, xml); params.set(hl, on); QueryResponse response = solr.query(params); How can I get the query result (response) in XML format out f? I know it sounds stupid but I can't seem to manage that. What do I need to do with the response object to get the response in XML format ? I already understood I cant get the result in JSON so my idea was to go from XML to JSON. Thx for your answer already ! S. System.out.println(response = + response); SolrDocumentList sdl = response.getResults(); -- View this message in context: http://www.nabble.com/Solrj-client-API-and-response-in-XML-format-%28Solr-1.4%29-tp26029197p26029197.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- View this message in context: http://www.nabble.com/Solrj-client-API-and-response-in-XML-format-%28Solr-1.4%29-tp26029197p26037037.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Shards param accepts spaces between commas?
On Sun, Oct 25, 2009 at 9:34 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : It seems like no, and should be an easy change. I'm putting newlines : after the commas so the large shards list doesn't scroll off the : screen. Yeah ... for some odd reason QueryComponnent is using StrUtils.splitSmart() ... SolrPluginUtils.split() seems like a saner choice. A better question is probably why the shards parm isn't just multivalued. good question. I guess it should be (Yonik?) -Hoss -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Solr Random field
do you have a field whose type=random . If yes then u can sort by that field On Mon, Oct 26, 2009 at 3:35 PM, Pooja Verlani pooja.verl...@gmail.com wrote: Hi, I want a random sort type in the search results. The scenario is: I want to return random results with no context relation to the query fired, if I am not able to find any results relevant. I want something like: http://localhost:8083/solr/select/?q=*:*sort=RANDOM. Please suggest. Regards, Pooja -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Solr Configuration Management
2009/10/26 Licinio Fernández Maurelo licinio.fernan...@gmail.com: Hi there, i must enhance solr config deploys. I have a configuration file per environment and per role (Master-Slave) so i want to separate DataSource definitions from the solrconfig.xml . Where can i put them? are you referring to DIH? Same behaviour is desired for Master-Slave conf diffs. you can drop in all your custom properties in a solrcore.properties file (placed in conf dir) and can have different properties files for master and slave . These properties can be directly be referred from solrconfig Any help would be much appreciatted ... -- Lici -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: problem using solr 1.4 multicore with shareSchema=true
hi, Looks like a bug. open an issue. On Wed, Oct 28, 2009 at 4:04 AM, Jeremy Hinegardner jer...@hinegardner.org wrote: Hi all, I was trying to use the new 'shareSchema=true' feature in solr 1.4 and it appears as though this will only happen in one configuration. I'd like someone to confirm this for me and then we can file a bug on it. This all happens in CoreContainer.create(). When you have shareSchema=true in solr.xml then an instance variable indexSchemaCache is created in the CoreContainer instance. This snippet is from CoreContainer.create if (indexSchemaCache != null){ //schema sharing is enabled. so check if it already is loaded [1] File schemFile = new File(solrLoader.getInstanceDir() + conf + File.separator + dcore.getSchemaName()); if(schemFile. exists()){ [2] String key = schemFile.getAbsolutePath()+:+new SimpleDateFormat(MMddhhmmss).format(new Date(schemFile.lastModified())); schema = indexSchemaCache.get(key); if(schema == null){ log.info(creating new schema object for core: + dcore.name); schema = new IndexSchema(config, dcore.getSchemaName(), null); indexSchemaCache.put(key,schema); } else { log.info(re-using schema object for core: + dcore.name); } } } if(schema == null){ schema = new IndexSchema(config, dcore.getSchemaName(), null); } A couple of points: [1] dcore.getSchemaName() is the value that is in the 'schema' core / element in the solr.xml. This means that the this MUST be relative to the core-instance-dir/conf directory. Putting an absolute path in the xml means that schemFile.exists() will always return false. That is, if I put in core name=core0 schema=/opt/search/solr/conf/multicore-common-schema.xml / then schemFile will have a path of: /path/to/core0/instanceDir/conf/opt/search/solr/conf/multicore-common-schema.xml Which never exists. [2] If you do use a relative path to the schema.xml file, then the key will always be unique, since each schemFile is relative to a core's instanceDir, the core name is in the path and schemFile.getAbsolutePathe() will always be unique for every core. The result of this is, if I wanted to use shareSchema, it looks like the only way for that to happen, is if two cores are using the same instanceDir but different dataDir. I tried a test with this solr.xml in the example multicore configurae, and this appears to be the only way to reuse the schema instance, and to me this has a bit of a smell: solr persistent=false cores adminPath=/admin/cores shareSchema=true core name=core0 instanceDir=mcore schema=schema-common.xml dataDir=core0/data / core name=core1 instanceDir=mcore schema=schema-common.xml dataDir=core1/data / /cores /solr In my initial playing with this feature, I assumed that just putting in the full path to a common schema.xml file would do the trick. That is evidently not the way it works. What is the way that shareSchema=true is supposed to work? enjoy, -jeremy -- Jeremy Hinegardner jer...@hinegardner.org -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: problem using solr 1.4 multicore with shareSchema=true
I've opened an issue https://issues.apache.org/jira/browse/SOLR-1527 2009/10/28 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: hi, Looks like a bug. open an issue. On Wed, Oct 28, 2009 at 4:04 AM, Jeremy Hinegardner jer...@hinegardner.org wrote: Hi all, I was trying to use the new 'shareSchema=true' feature in solr 1.4 and it appears as though this will only happen in one configuration. I'd like someone to confirm this for me and then we can file a bug on it. This all happens in CoreContainer.create(). When you have shareSchema=true in solr.xml then an instance variable indexSchemaCache is created in the CoreContainer instance. This snippet is from CoreContainer.create if (indexSchemaCache != null){ //schema sharing is enabled. so check if it already is loaded [1] File schemFile = new File(solrLoader.getInstanceDir() + conf + File.separator + dcore.getSchemaName()); if(schemFile. exists()){ [2] String key = schemFile.getAbsolutePath()+:+new SimpleDateFormat(MMddhhmmss).format(new Date(schemFile.lastModified())); schema = indexSchemaCache.get(key); if(schema == null){ log.info(creating new schema object for core: + dcore.name); schema = new IndexSchema(config, dcore.getSchemaName(), null); indexSchemaCache.put(key,schema); } else { log.info(re-using schema object for core: + dcore.name); } } } if(schema == null){ schema = new IndexSchema(config, dcore.getSchemaName(), null); } A couple of points: [1] dcore.getSchemaName() is the value that is in the 'schema' core / element in the solr.xml. This means that the this MUST be relative to the core-instance-dir/conf directory. Putting an absolute path in the xml means that schemFile.exists() will always return false. That is, if I put in core name=core0 schema=/opt/search/solr/conf/multicore-common-schema.xml / then schemFile will have a path of: /path/to/core0/instanceDir/conf/opt/search/solr/conf/multicore-common-schema.xml Which never exists. [2] If you do use a relative path to the schema.xml file, then the key will always be unique, since each schemFile is relative to a core's instanceDir, the core name is in the path and schemFile.getAbsolutePathe() will always be unique for every core. The result of this is, if I wanted to use shareSchema, it looks like the only way for that to happen, is if two cores are using the same instanceDir but different dataDir. I tried a test with this solr.xml in the example multicore configurae, and this appears to be the only way to reuse the schema instance, and to me this has a bit of a smell: solr persistent=false cores adminPath=/admin/cores shareSchema=true core name=core0 instanceDir=mcore schema=schema-common.xml dataDir=core0/data / core name=core1 instanceDir=mcore schema=schema-common.xml dataDir=core1/data / /cores /solr In my initial playing with this feature, I assumed that just putting in the full path to a common schema.xml file would do the trick. That is evidently not the way it works. What is the way that shareSchema=true is supposed to work? enjoy, -jeremy -- Jeremy Hinegardner jer...@hinegardner.org -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Fwd: Full Text Search: Solr on Cassandra
-- Forwarded message -- From: Nick Lothian nloth...@educationau.edu.au Date: Wed, Oct 28, 2009 at 11:37 AM Subject: Full Text Search: Solr on Cassandra To: cassandra-u...@incubator.apache.org cassandra-u...@incubator.apache.org Just in case anyone here is interested, I've managed to get Solr working on Cassandra using Jake Luciani's Lucandra (Lucene on Cassandra). It's very early code, but it does prove it is possible. Details (and code): http://nicklothian.com/blog/2009/10/27/solr-cassandra-solandra/ Regards Nick Lothian IMPORTANT: This e-mail, including any attachments, may contain private or confidential information. If you think you may not be the intended recipient, or if you have received this e-mail in error, please contact the sender immediately and delete all copies of this e-mail. If you are not the intended recipient, you must not reproduce any part of this e-mail or disclose its contents to any other party. This email represents the views of the individual sender, which do not necessarily reflect those of Education.au except where the sender expressly states otherwise. It is your responsibility to scan this email and any files transmitted with it for viruses or any other defects. education.au limited will not be liable for any loss, damage or consequence caused directly or indirectly by this email. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Question about DIH execution order
On Sun, Nov 1, 2009 at 11:59 PM, Bertie Shen bertie.s...@gmail.com wrote: Hi folks, I have the following data-config.xml. Is there a way to let transformation take place after executing SQL select comment from Rating where Rating.CourseId = ${Course.CourseId}? In MySQL database, column CourseId in table Course is integer 1, 2, etc; template transformation will make them like Course:1, Course:2; column CourseId in table Rating is also integer 1, 2, etc. If transformation happens before executing select comment from Rating where Rating.CourseId = ${Course.CourseId}, then there will no match for the SQL statement execution. document entity name=Course transformer=TemplateTransformer query=select * from Course field column=CourseId template=Course:${Course.CourseId} name=id/ entity name=Rating query=select comment from Rating where Rating.CourseId = ${Course.CourseId} field column=comment name=review/ /entity /entity /document keep the field as follows field column=TmpCourseId name=CourseId template=Course:${Course.CourseId} name=id/ -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Greater-than and less-than in data import SQL queries
On Mon, Nov 2, 2009 at 11:34 AM, Amit Nithian anith...@gmail.com wrote: A thought I had on this from a DIH design perspective. Would it be better to have the SQL queries stored in an element rather than an attribute so that you can wrap it in a CDATA block without having to mess up the look of query with lt, gt? Makes debugging easier (I know find and replace is trivial but it can be annoying when debugging SQL issues :-)). Actually most of the parsers are forgiving in this aspect. I mean '' and '' are ok in the xml parser shipped with the jdk. On Wed, Oct 28, 2009 at 5:15 PM, Lance Norskog goks...@gmail.com wrote: It is easier to put SQL select statements in a view, and just use that view from the DIH configuration file. On Tue, Oct 27, 2009 at 12:30 PM, Andrew Clegg andrew.cl...@gmail.com wrote: Heh, eventually I decided where 4 node_depth was the most pleasing (if slightly WTF-ish) way of writing it... Cheers, Andrew. Erik Hatcher-4 wrote: Use lt; instead of in that attribute. That should fix the issue. Remember, it's an XML file, so it has to obey XML encoding rules which make it ugly but whatcha gonna do? Erik On Oct 27, 2009, at 11:50 AM, Andrew Clegg wrote: Hi, If I have a DataImportHandler query with a greater-than sign in, like this: entity name=higher_node dataSource=database query=select *, title as keywords from cathnode_text where node_depth 4 Everything's fine. However, if it contains a less-than sign: entity name=higher_node dataSource=database query=select *, title as keywords from cathnode_text where node_depth 4 I get this exception: INFO: Processing configuration from solrconfig.xml: {config=dataconfig.xml} [Fatal Error] :240:129: The value of attribute query associated with an element type null must not contain the '' character. 27-Oct-2009 15:30:49 org.apache.solr.handler.dataimport.DataImportHandler inform SEVERE: Exception while loading DataImporter org.apache.solr.handler.dataimport.DataImportHandlerException: Exception occurred while initializing context at org .apache .solr .handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:184) at org .apache .solr.handler.dataimport.DataImporter.init(DataImporter.java:101) at org .apache .solr .handler.dataimport.DataImportHandler.inform(DataImportHandler.java: 113) at org .apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java: 424) at org.apache.solr.core.SolrCore.init(SolrCore.java:588) at org.apache.solr.core.CoreContainer $Initializer.initialize(CoreContainer.java:137) at org .apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java: 83) at org .apache .catalina .core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java: 275) at org .apache .catalina .core .ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java: 397) at org .apache .catalina .core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:108) at org .apache .catalina.core.StandardContext.filterStart(StandardContext.java:3709) at org.apache.catalina.core.StandardContext.start(StandardContext.java: 4356) at org.apache.catalina.manager.ManagerServlet.start(ManagerServlet.java: 1244) at org .apache .catalina.manager.HTMLManagerServlet.start(HTMLManagerServlet.java: 604) at org .apache .catalina.manager.HTMLManagerServlet.doGet(HTMLManagerServlet.java: 129) at javax.servlet.http.HttpServlet.service(HttpServlet.java:690) at javax.servlet.http.HttpServlet.service(HttpServlet.java:803) at org .apache .catalina .core .ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java: 290) at org .apache .catalina .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org .apache .catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java: 233) at org .apache .catalina.core.StandardContextValve.invoke(StandardContextValve.java: 175) at org .apache .catalina .authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:525) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java: 568) at org .apache .catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org .apache .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org .apache .catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 109) at org .apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 286) at org .apache.coyote.http11.Http11Processor.process(Http11Processor.java: 844) at
Re: Why does BinaryRequestWriter force the path to be base URL + /update/javabin
yup, that can be relaxed. It was just a convention. On Tue, Nov 3, 2009 at 5:24 AM, Stuart Tettemer stette...@gmail.com wrote: Hi folks, First of all, thanks for Solr. It is a great piece of work. I have a question about BinaryRequestWriter in the solrj project. Why does it force the path of UpdateRequests to have be /update/javabin (see BinaryRequestWriter.getPath(String) starting on line 109)? I am extending BinaryRequestWriter specifically to remove this requirement and am interested to know the reasoning behind in the inital choice. Thanks for your time, Stuart -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Annotations and reference types
I guess this is not a very good idea. The document itself is a flat data structure. It is hard to see that is nested datastructure. If allowed , how deep would we wish to make it. The simple solution would be to write setters for b_id and b_name in class A and the setters can inject values into B. On Mon, Nov 2, 2009 at 10:05 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Thu, Oct 29, 2009 at 7:57 PM, M. Tinnemeyer marc-...@gmx.net wrote: Dear listusers, Is there a way to store an instance of class A (including the fields from myB) via solr using annotations ? The index should look like : id; name; b_id; b_name -- Class A { @Field private String id; @Field private String name; @Field private B myB; } -- Class B { @Field(b_id) private String id; @Field(B_name) private String name; } No. I guess you want to represent certain fields in class B and have them as an attribute in Class A (but all fields belong to the same schema), then it can be a worthwhile addition to Solrj. Can you open an issue? A patch would be even better :) -- Regards, Shalin Shekhar Mangar. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: How to integrate Solr into my project
is it a java project ? did you see this page http://wiki.apache.org/solr/Solrj ? On Tue, Nov 3, 2009 at 2:25 PM, Caroline Tan caroline@gmail.com wrote: Hi, I wish to intergrate Solr into my current working project. I've played around the Solr example and get it started in my tomcat. But the next step is HOW do i integrate that into my working project? You see, Lucence provides API and tutorial on what class i need to instanstiate in order to index and search. But Solr seems to be pretty vague on this..as it is a working solr search server. Can anybody help me by stating the steps by steps, what classes that i should look into in order to assimiliate Solr into my project? Thanks. regards ~caroLine -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: solr search
start with the examples in the download. That should help On Wed, Nov 4, 2009 at 11:14 AM, manishkbawne manish.ba...@gmail.com wrote: Thank you for your reply. I have corrected this error, but now I am getting this error -- HTTP Status 500 - Bad version number in .class file java.lang.UnsupportedClassVersionError: Bad version number in .class file at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass I have checked the java -version and javac -version. Both shows the same version 1.5.0_09. How to remove this error? Lance Norskog-2 wrote: The problem is in db-dataconfig.xml. You should start with the example DataImportHandler configuration fles. The structure is wrong. First there is a datasource, then there are 'entities' which fetch a document's fields from the datasource. On Fri, Oct 30, 2009 at 9:03 PM, manishkbawne manish.ba...@gmail.com wrote: Hi, I have made following changes in solrconfig.xml requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configC:/Apache-Tomcat/apache-tomcat-6.0.20/solr/conf/db-data-config.xml/str /lst /requestHandler in db-dataconfig.xml dataConfig document name=id1 dataSource type=JdbcDataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://servername:1433/databasename user=sa password=p...@123/ entity name=id1 query=select id from be field column=id name=id1 / /entity /document /dataConfig in schema.xml files field name=id1 type=string indexes=true default=none/ Please suggest me the possible cause of error?? Lance Norskog-2 wrote: Please post your dataimporthandler configuration file. On Fri, Oct 30, 2009 at 4:17 AM, manishkbawne manish.ba...@gmail.com wrote: Thanks for your reply .. I am trying to use the database for solr search but getting this error.. abortOnConfigurationErrorfalse/abortOnConfigurationError in null - java.lang.NullPointerException at org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.java:95) at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106) at org.apache.solr.core.SolrResourceLoader Can you please suggest me some possible solution? Karsten F. wrote: hi manishkbawne, unspecific ideas of search improvements are her: http://wiki.apache.org/solr/SolrPerformanceFactors I really like the last idea in http://wiki.apache.org/lucene-java/ImproveSearchingSpeed : Use a profiler and ask a more specific question in this forum. Best regards Karsten manishkbawne wrote: I am using solr search to search through xml files. As I am working on millions of data, the result output is slower. Can anyone please suggest me some way, by which I can increase the search result output? -- View this message in context: http://old.nabble.com/solr-search-tp26125183p26128341.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com -- View this message in context: http://old.nabble.com/solr-search-tp26125183p26139946.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com -- View this message in context: http://old.nabble.com/solr-search-tp26125183p26191282.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Bug with DIH and MySQL CONCAT()?
Thanks, It would be nice to add this to the DIH FAQ On Wed, Nov 4, 2009 at 8:27 PM, Jonathan Hendler jonathan.hend...@gmail.com wrote: Thanks Chantal for the explanation of the issue. Avlesh - worked great. Thank you! On Nov 4, 2009, at 9:44 AM, Avlesh Singh wrote: Try cast(concat(...) as char) ... Cheers Avlesh On Wed, Nov 4, 2009 at 7:36 PM, Jonathan Hendler jonathan.hend...@gmail.com wrote: Hi All, I have an SQL query that begins with SELECT CONCAT ( 'ID', Subject.id , ':' , Subject.name , ':L', Subject.level) as subject_name and the query runs great against MySQL from the command line. Since this is a nested entity, the schema.xml contains field name=subject_name type=string indexed=true stored=true multiValued=true / After a full-import, a select output of the xml looks like arr name=subject_name str[...@1db4c43/str str[...@6bcef1/str str[...@1df503b/str str[...@c5dbb/str str[...@1ddc3ea/str str[...@6963b0/str str[...@10fe215/str ... Without a CONCAT - it works fine. Is this a bug? Meanwhile - should I go about concatenating some where else in the DIH config? Thanks. - Jonathan -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: DIH timezone offset
DIH relies on the driver to get the date. It does not do any automatic conversion. Is it possible for the driver to give the date with the right offset? On Thu, Nov 5, 2009 at 3:21 AM, Mike mpiluson...@comcast.net wrote: Hi I'm importing database records into SOLR using the DIH. My dates are not getting imported correctly and they're getting a timezone offset added to them (+4 hours). I know SOLR tends to be timezone agnostic from the documentation I've read so far, but what could this be? Any tips on where to look would be greatly appreciated. I'm using the trunk version of SOLR built on 11/1. Mike -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: DIH timezone offset
anyone to add this here http://wiki.apache.org/solr/DataImportHandlerFaq On Thu, Nov 5, 2009 at 8:35 PM, mpiluson...@comcast.net wrote: DIH relies on the driver to get the date. It does not do any automatic conversion. Is it possible for the driver to give the date with the right offset? I have retried a full-import after setting the Java user.timezone property to UTC and the dates import correctly. I've narrowed down the problem to the way SQL server is returning dates. Converting it to ISO-8601 format resolves the issue, but I had to append a 'Z' at the end of the conversion like so: select convert(varchar(30),datesentutc,126)+'Z' as date from table. Hope this is helpful to someone else. Thanks for the help. Mike -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Solr Replication: How to restore data from last snapshot
if it is a single core you will have to restart the master On Sat, Nov 7, 2009 at 1:55 AM, Osborn Chan oc...@shutterfly.com wrote: Thanks. But I have following use cases: 1) Master index is corrupted, but it didn't replicate to slave servers. - In this case, I only need to restore to last snapshot. 2) Master index is corrupted, and it has replicated to slave servers. - In this case, I need to restore to last snapshot, and make sure slave servers replicate the restored index from index server as well. Assuming both cases are in production environment, and I cannot shutdown the master and slave servers. Is there any rest API call or something else I can do without manually using linux command and restart? Thanks, Osborn -Original Message- From: Matthew Runo [mailto:matthew.r...@gmail.com] Sent: Friday, November 06, 2009 12:20 PM To: solr-user@lucene.apache.org Subject: Re: Solr Replication: How to restore data from last snapshot If your master index is corrupt and it hasn't been replicated out, you should be able to shut down the server and remove the corrupted index files. Then copy the replicated index back onto the master and start everything back up. As far as I know, the indexes on the replicated slaves are exactly what you'd have on the master, so this method should work. --Matthew Runo On Fri, Nov 6, 2009 at 11:41 AM, Osborn Chan oc...@shutterfly.com wrote: Hi, I have followed Solr set up ReplicationHandler for index replication to slave. Do anyone know how to restore corrupted index from snapshot in master, and force replication of the restored index to slave? Thanks, Osborn -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: solr search
Please paste the complete stacktrace On Fri, Nov 6, 2009 at 1:37 PM, manishkbawne manish.ba...@gmail.com wrote: Thanks for assistance. Actually I installed jdk 6 and my problem was resolved. But now I am getting this exception:- org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select PkMenuId from WCM_Menu Processing Document # 1 at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:186) at --- The changes the db-dataconfig.xml file are as :- dataConfig document entity name=WCM_Menu query=select PkMenuId from WCM_Menu fetchSize=1 field column=PkMenuId name=id1 / /entity /document /dataConfig I don't think, there is some problem with missing hyphen. Please anybody suggest me some way to resolve this error? Manish Bawne Software Engineer Biz Integra Systems www.bizhandel.com Chantal Ackermann wrote: Hi Manish, is this a typo in your e-mail or is your config file really missing a hyphen? (Your repeating the name without second hyphen several times.) Cheers, Chantal manishkbawne schrieb: str name=configdb-data-config.xml/str The changes that I have done in the db-dataconfig.xml file is :- -- View this message in context: http://old.nabble.com/solr-search-tp26125183p26228077.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Specifying multiple documents in DataImportHandler dataConfig
On Sun, Nov 8, 2009 at 8:25 AM, Bertie Shen bertie.s...@gmail.com wrote: I have figured out a way to solve this problem: just specify a single document blah blah blah /document. Under document, specify multiple top level entity entries, each of which corresponds to one table data. So each top level entry will map one row in it to a document in Lucene index. document in DIH is *NOT* mapped to a document in Lucene index while top-level entity is. I feel document tag is redundant and misleading in data config and thus should be removed. There are some common attributes specified at the document level . It still acts as a container tag . Cheers. On Sat, Nov 7, 2009 at 9:43 AM, Bertie Shen bertie.s...@gmail.com wrote: I have the same problem. I had thought we could specify multiple document blah blah blah/documents, each of which is mapping one table in the RDBMS. But I found it was not the case. It only picks the first documentblah blah blah/document to do indexing. I think Rupert's and my request are pretty common. Basically there are multiple tables in RDBMS, and we want each row in each table become a document in Lucene index. How can we write one data config.xml file to let DataImportHandler import multiple tables at the same time? Rupert, have you figured out a way to do it? Thanks. On Tue, Sep 8, 2009 at 3:42 PM, Rupert Fiasco rufia...@gmail.com wrote: Maybe I should be more clear: I have multiple tables in my DB that I need to save to my Solr index. In my app code I have logic to persist each table, which maps to an application model to Solr. This is fine. I am just trying to speed up indexing time by using DIH instead of going through my application. From what I understand of DIH I can specify one dataSource element and then a series of document/entity sets, for each of my models. But like I said before, DIH only appears to want to index the first document declared under the dataSource tag. -Rupert On Tue, Sep 8, 2009 at 4:05 PM, Rupert Fiascorufia...@gmail.com wrote: I am using the DataImportHandler with a JDBC datasource. From my understanding of DIH, for each of my content types e.g. Blog posts, Mesh Categories, etc I would construct a series of document/entity sets, like dataConfig dataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql:// / !-- BLOG ENTRIES -- document name=blog_entries entity name=blog_entries query=select id,title,keywords,summary,data,title as name_fc,'BlogEntry' as type from blog_entries field column=id name=pk_i / field column=id name=id / field column=title name=text_t / field column=data name=text_t / /entity /document !-- MESH CATEGORIES -- document name=mesh_category entity name=mesh_categories query=select id,name,node_key,name as name_fc,'MeshCategory' as type from mesh_categories field column=id name=pk_i / field column=id name=id / field column=name name=text_t / field column=node_key name=string / field column=name_fc name=facet_value / field column=type name=type_t / /entity /document /datasource /dataConfig Solr parses this just fine and allows me to issue a /dataimport?command=full-import and it runs, but it only runs against the first document (blog_entries). It doesnt run against the 2nd document (mesh_categories). If I remove the 2 document elements and wrap both entity sets in just one document tag, then both sets get indexed, which seemingly achieves my goal. This just doesnt make sense from my understanding of how DIH works. My 2 content types are indeed separate so they logically represent two document types, not one. Is this correct? What am I missing here? Thanks -Rupert -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Getting started with DIH
On Mon, Nov 9, 2009 at 12:43 PM, Michael Lackhoff mich...@lackhoff.de wrote: On 09.11.2009 06:54 Erik Hatcher wrote: The brackets probably come from it being transformed as an array. Try saying multiValued=false on your field specifications. Indeed. Thanks Erik that was it. My first steps with DIH showed me what a powerful tool this is but although the DIH wiki page might well be the longest in the whole wiki there are so many mysteries left for the uninitiated. Is there any other documentation I might have missed? There is an FAQ page and that is it http://wiki.apache.org/solr/DataImportHandlerFaq It just started of as a single page and the features just got piled up and the page just bigger. we are thinking of cutting it down to smaller more manageable pages Thanks -Michael -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Getting started with DIH
The tried and tested strategy is to post the question in this mailing list w/ your data-config.xml. On Mon, Nov 9, 2009 at 1:08 PM, Michael Lackhoff mich...@lackhoff.de wrote: On 09.11.2009 08:20 Noble Paul നോബിള് नोब्ळ् wrote: It just started of as a single page and the features just got piled up and the page just bigger. we are thinking of cutting it down to smaller more manageable pages Oh, I like it the way it is as one page, so that the browser full text search can help. It is just that the features and power seem to grow even faster than the wike page ;-) E.g. I couldn't find a way how to add a second rss feed. I tried with a second entity parallel to the slashdot one but got an exception: java.io.IOException: FULL whatever that means, so I must be doing something wrong but couldn't find a hint. -Michael -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: How to import multiple RSS-feeds with DIH
On Mon, Nov 9, 2009 at 1:26 PM, Michael Lackhoff mich...@lackhoff.de wrote: [A new thread for this particular problem] On 09.11.2009 08:44 Noble Paul നോബിള് नोब्ळ् wrote: The tried and tested strategy is to post the question in this mailing list w/ your data-config.xml. See my data-config.xml below. The first is the usual slashdot example with my 'id' addition, the second a very simple addtional feed. The second example works if I delete the slashdot-feed but as I said I would like to have them both. When you say , the second example does not work , what does it mean? some exception?(if yes, please post the stacktrace) -Michael dataConfig dataSource type=HttpDataSource / document entity name=slashdot pk=link url=http://rss.slashdot.org/Slashdot/slashdot; processor=XPathEntityProcessor forEach=/RDF/channel | /RDF/item transformer=TemplateTransformer,DateFormatTransformer field column=source xpath=/RDF/channel/title commonField=true / field column=source-link xpath=/RDF/channel/link commonField=true / field column=subject xpath=/RDF/channel/subject commonField=true / field column=title xpath=/RDF/item/title / field column=link xpath=/RDF/item/link / field column=id template=${slashdot.link} / field column=description xpath=/RDF/item/description / field column=creator xpath=/RDF/item/creator / field column=item-subject xpath=/RDF/item/subject / field column=slash-department xpath=/RDF/item/department / field column=slash-section xpath=/RDF/item/section / field column=slash-comments xpath=/RDF/item/comments / field column=date xpath=/RDF/item/date dateTimeFormat=-MM-dd'T'hh:mm:ss / /entity entity name=heise pk=link url=http://www.heise.de/newsticker/heise.rdf; processor=XPathEntityProcessor forEach=/RDF/channel | /RDF/item transformer=TemplateTransformer field column=source xpath=/RDF/channel/title commonField=true / field column=source-link xpath=/RDF/channel/link commonField=true / field column=title xpath=/RDF/item/title / field column=link xpath=/RDF/item/link / field column=id template=${heise.link} / /entity /document /dataConfig -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: solr and hibernate integration
The point is that the usual complex POJO mapping does not work in Solr. For all the supported cases , SolrJ mapping works well To answer your question , I am not aware of anybody making it work w/ hibernate On Mon, Nov 9, 2009 at 1:54 PM, Kiwi de coder kiwio...@gmail.com wrote: hi, I had a project which is required to index POJO and search it from database. however, the current support for POJO is only limited to field value, which still lack of support of complex domain object model like composite element, collection etc. hibernate search had done a great job that is able to index complex POJO, I wondering is some one had wrote a plug-in that can handle complex POJO (like what hibernate search doing for indexing) ? kiwi -- happy hacking ! -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: [DIH] SqlEntityProcessor does not recognize onError attribute
On Mon, Nov 9, 2009 at 4:24 PM, Sascha Szott sz...@zib.de wrote: Hi all, as stated in the Solr-WIKI, Solr 1.4 allows it to specify an onError attribute for *each* entity listed in the data config file (it is considered as one of the default attributes). Unfortunately, the SqlEntityProcessor does not recognize the attribute's value -- i.e., in case an SQL exception is thrown somewhere inside the constructor of ResultSetIterators (which is an inner class of JdbcDataSource), Solr's import exits immediately, even though onError is set to continue or skip. Why are database related exceptions (e.g., table does not exists, or an error in query syntax occurs) not being covered by the onError attribute? In my opinion, use cases exist that will profit from such an exception handling inside of Solr (for example, in cases where the existence of certain database tables or views is not predictable). We thought DB errors are not to be ignored because errors such as table does not exist can be really serious. Should I raise an JIRA-issue about this? Raise an issue it can be fixed -Sascha -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: [DIH] blocking import operation
DIH imports are really long running. There is a good chance that the connection times out or breaks in between. how about a callback? On Tue, Nov 10, 2009 at 12:12 AM, Sascha Szott sz...@zib.de wrote: Hi all, currently, DIH's import operation(s) only works asynchronously. Therefore, after submitting an import request, DIH returns immediately, while the import process (in case a large amount of data needs to be indexed) continues asynchronously behind the scenes. So, what is the recommended way to check if the import process has already finished? Or still better, is there any method / workaround that will block the import operation's caller until the operation has finished? In my application, the DIH receives some URL parameters which are used for determining the database name that is used within data-config.xml, e.g. http://localhost:8983/solr/dataimport?command=full-importdbname=foo Since only one DIH, /dataimport, is defined, but several database needs to be indexed, it is required to issue this command several times, e.g. http://localhost:8983/solr/dataimport?command=full-importdbname=foo ... wait until /dataimport?command=status says Indexing completed (but without using a loop that checks it again and again) ... http://localhost:8983/solr/dataimport?command=full-importdbname=barclean=false A suitable solution, at least IMHO, would be to have an additional DIH parameter which determines whether the import call is blocking on non-blocking, the default. As far as I see, this could be accomplished since Solr can execute more than one import operation at a time (it starts a new thread for each). Perhaps, my question is somehow related to the discussion [1] on ParallelDataImportHandler. Best, Sascha [1] http://www.lucidimagination.com/search/document/a9b26ade46466ee -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: A question about how to make schema.xml change take effect
if your are using a multicore instance you may just reload the core On Tue, Nov 10, 2009 at 12:07 PM, Ritesh Gurung rit...@srijan.in wrote: Well everytime you make change in schema.xml file you need restart the tomcat server. On Tue, Nov 10, 2009 at 11:59 AM, Bertie Shen bertie.s...@gmail.com wrote: Hey folks, When I update schema.xml, I found most of time I do not need to restart tomcat in order to make change take effect. But sometimes, I have to restart tomcat server to make change take effect. For example, when I change a field data type from sint to tlong, I called http://host:port/solr/dataimport?command=full-importcommit=trueclean=true. I clicked [Schema] link from admin page and found data type is tlong; but click [Schema Browser] and that field link, I found the data type is still sint. When I make a search, the result also shows the field is still sint. The only way to make the change effective I found is to restart tomcat. I want to confirm whether it is intended or it is a bug. Thanks. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Configuring 1.4 - multi master setup?
see the setting up a repeater section in this page http://wiki.apache.org/solr/SolrReplication On Tue, Nov 10, 2009 at 5:17 PM, Kevin Jackson foamd...@gmail.com wrote: Hi all, We have a situation where we would like to have 1 Master server (creates the index) 1 input slave server (which receives the updated index from the master) n slaves (which receive the updated index from the input slave server) This is to prevent each of the n slaves polling the master server. a: is this setup possible? b: has anyone done anything like this, if so do you have any advice? This is all with 1.4 so we would be using inbuilt/java replication, not snapshooter/snappuller Thanks, Kev -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Configuring 1.4 - multi master setup?
On Tue, Nov 10, 2009 at 7:58 PM, Walter Underwood wun...@wunderwood.org wrote: Replication creates very little load on the master, so you should not need to have a separate machine just to handle the replication. Why do you think you need that? correct. A repeater is setup when your main master is not located in the same LAN wunder On Nov 10, 2009, at 5:37 AM, Kevin Jackson wrote: Hi, 2009/11/10 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: see the setting up a repeater section in this page http://wiki.apache.org/solr/SolrReplication Doh! Sorry for the noise Thanks, Kev -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: [DIH] blocking import operation
Yes , open an issue . This is a trivial change On Thu, Nov 12, 2009 at 5:08 AM, Sascha Szott sz...@zib.de wrote: Noble, Noble Paul wrote: DIH imports are really long running. There is a good chance that the connection times out or breaks in between. Yes, you're right, I missed that point (in my case imports take no longer than a minute). how about a callback? Thanks for the hint. There was a discussion on adding a callback url to DIH a month ago, but it seems that no issue was raised. So, up to now its only possible to implement an appropriate Solr EventListener. Should we open an issue for supporting callback urls? Best, Sascha On Tue, Nov 10, 2009 at 12:12 AM, Sascha Szott sz...@zib.de wrote: Hi all, currently, DIH's import operation(s) only works asynchronously. Therefore, after submitting an import request, DIH returns immediately, while the import process (in case a large amount of data needs to be indexed) continues asynchronously behind the scenes. So, what is the recommended way to check if the import process has already finished? Or still better, is there any method / workaround that will block the import operation's caller until the operation has finished? In my application, the DIH receives some URL parameters which are used for determining the database name that is used within data-config.xml, e.g. http://localhost:8983/solr/dataimport?command=full-importdbname=foo Since only one DIH, /dataimport, is defined, but several database needs to be indexed, it is required to issue this command several times, e.g. http://localhost:8983/solr/dataimport?command=full-importdbname=foo ... wait until /dataimport?command=status says Indexing completed (but without using a loop that checks it again and again) ... http://localhost:8983/solr/dataimport?command=full-importdbname=barclean=false A suitable solution, at least IMHO, would be to have an additional DIH parameter which determines whether the import call is blocking on non-blocking, the default. As far as I see, this could be accomplished since Solr can execute more than one import operation at a time (it starts a new thread for each). Perhaps, my question is somehow related to the discussion [1] on ParallelDataImportHandler. Best, Sascha [1] http://www.lucidimagination.com/search/document/a9b26ade46466ee -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: ${dataimporter.delta.twitter_id} not getting populated in deltaImportQuery
are you sure the data comes back in the same name. Some DBs return the field names in ALL CAPS you may try out a delta_import using a full import too http://wiki.apache.org/solr/DataImportHandlerFaq#My_delta-import_goes_out_of_memory_._Any_workaround_.3F On Wed, Nov 11, 2009 at 9:55 PM, Mark Ellul m...@catalystic.com wrote: I have 2 entities from the root node, not sure if that makes a difference! On Wed, Nov 11, 2009 at 4:49 PM, Mark Ellul m...@catalystic.com wrote: Hi, I have a interesting issue... Basically I am trying to delta imports on solr 1.4 on a postgresql 8.3 database. Basically when I am running a delta import with the entity below I get an exception (see below the entity definition) showing the query its trying to run and you can see that its not populating the where clause of my dataImportQuery. I have tried ${dataimporter.delta.twitter_id} and ${dataimporter.delta.id} and get the same exceptions. Am I missing something obvious? Any help would be appreciated! Regards Mark entity name=Tweeter pk=twitter_id query= select twitter_id, twitter_id as pk, 1 as site_id, screen_name from api_tweeter WHERE tweet_mapreduce_on IS NOT NULL; transformer=TemplateTransformer deltaImportQuery= select twitter_id, twitter_id as pk, 1 as site_id, screen_name from api_tweeter where twitter_id=${dataimporter.delta.twitter_id }; deltaQuery =select twitter_id from api_tweeter where modified_on '${dataimporter.last_index_time}' and tweet_mapreduce_on IS NOT NULL; field name=twitter_id column=twitter_id / /entity INFO: Completed parentDeltaQuery for Entity: Tweeter Nov 11, 2009 3:35:44 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument SEVERE: Exception while processing: Tweeter document : SolrInputDocument[{}] org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select twitter_id, twitter_id as pk, 1 as site_id, screen_name from api_tweeter where twitter_id=; Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:276) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:172) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:352) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:391) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at end of input Position: 1197 at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2062) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1795) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:479) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:353) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:345) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:246) ... 11 more Nov 11, 2009 3:35:44 PM org.apache.solr.handler.dataimport.DataImporter doDeltaImport SEVERE: Delta Import Failed org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select twitter_id, twitter_id as pk, 1 as site_id, screen_name from api_tweeter where twitter_id=; Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58) at
Re: Persist in Core Admin
On Thu, Nov 12, 2009 at 3:13 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: It looks like our core admin wiki doesn't cover the persist action? http://wiki.apache.org/solr/CoreAdmin I'd like to be able to persist the cores to solr.xml, even if solr persistent=false. It seems like the persist action does this? yes. But you will have to specify a 'file' parameter -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Fwd: ${dataimporter.delta.twitter_id} not getting populated in deltaImportQuery
-- Forwarded message -- From: Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com Date: 2009/11/12 Subject: Re: ${dataimporter.delta.twitter_id} not getting populated in deltaImportQuery To: Mark Ellul m...@catalystic.com On Thu, Nov 12, 2009 at 8:17 PM, Mark Ellul m...@catalystic.com wrote: I think I got it working, thanks for your response... Once I removed the TemplateTransformer from the entity. Could that have been the issue? Could the template transformer have been changing the ${dataimporter.delta.twitter_id} into nothing? But the though templateTransformer is mentioned , it is not applied on any field . is it? I do not see the attribute 'template' on any field. Regards Mark 2009/11/12 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com delta-import is slightly tricky ,. There are no traps which will let you know the intermediate data. That is why I suggested you to do a delta-import using the full-import .It can probably reveal what is the problem On Thu, Nov 12, 2009 at 6:05 PM, Mark Ellul m...@catalystic.com wrote: Hi Noble, Thanks for the response. CAPS is not the issue. Can you please confirm the link below is the code for the SQLEntityProcessor in the release 1.4? http://svn.apache.org/viewvc/lucene/solr/tags/release-1.4.0/contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/SqlEntityProcessor.java?revision=834197view=markup Is there a way to output what is returned from the deltaQuery? Or the actual queries sent to the database server? Regards Mark 2009/11/12 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com are you sure the data comes back in the same name. Some DBs return the field names in ALL CAPS you may try out a delta_import using a full import too http://wiki.apache.org/solr/DataImportHandlerFaq#My_delta-import_goes_out_of_memory_._Any_workaround_.3F On Wed, Nov 11, 2009 at 9:55 PM, Mark Ellul m...@catalystic.com wrote: I have 2 entities from the root node, not sure if that makes a difference! On Wed, Nov 11, 2009 at 4:49 PM, Mark Ellul m...@catalystic.com wrote: Hi, I have a interesting issue... Basically I am trying to delta imports on solr 1.4 on a postgresql 8.3 database. Basically when I am running a delta import with the entity below I get an exception (see below the entity definition) showing the query its trying to run and you can see that its not populating the where clause of my dataImportQuery. I have tried ${dataimporter.delta.twitter_id} and ${dataimporter.delta.id} and get the same exceptions. Am I missing something obvious? Any help would be appreciated! Regards Mark entity name=Tweeter pk=twitter_id query= select twitter_id, twitter_id as pk, 1 as site_id, screen_name from api_tweeter WHERE tweet_mapreduce_on IS NOT NULL; transformer=TemplateTransformer deltaImportQuery= select twitter_id, twitter_id as pk, 1 as site_id, screen_name from api_tweeter where twitter_id=${dataimporter.delta.twitter_id }; deltaQuery =select twitter_id from api_tweeter where modified_on '${dataimporter.last_index_time}' and tweet_mapreduce_on IS NOT NULL; field name=twitter_id column=twitter_id / /entity INFO: Completed parentDeltaQuery for Entity: Tweeter Nov 11, 2009 3:35:44 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument SEVERE: Exception while processing: Tweeter document : SolrInputDocument[{}] org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select twitter_id, twitter_id as pk, 1 as site_id, screen_name from api_tweeter where twitter_id=; Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:276) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:172
Re: javabin in .NET?
Is there any tool to directly port java to .Net? then we can etxract out the client part of the javabin code and convert it. On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Has anyone looked into using the javabin response format from .NET (instead of SolrJ)? It's mainly a curiosity. How much better could performance/bandwidth/throughput be? How difficult would it be to implement some .NET code (C#, I'd guess being the best choice) to handle this response format? Thanks, Erik -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: [DIH] concurrent requests to DIH
I guess SOLR-1352 should solve all the problems with performance. I am working on one currently and I hope to submit a patch soon. On Thu, Nov 12, 2009 at 8:05 PM, Sascha Szott sz...@zib.de wrote: Hi Avlesh, Avlesh Singh wrote: 1. Is it considered as good practice to set up several DIH request handlers, one for each possible parameter value? Nothing wrong with this. My assumption is that you want to do this to speed up indexing. Each DIH instance would block all others, once a Lucene commit for the former is performed. Thanks for this clarification. 2. In case the range of parameter values is broad, it's not convenient to define separate request handlers for each value. But this entails a limitation (as far as I see): It is not possible to fire several request to the same DIH handler (with different parameter values) at the same time. Nope. I had done a similar exercise in my quest to write a ParallelDataImportHandler. This thread might be of interest to you - http://www.lucidimagination.com/search/document/a9b26ade46466ee/queries_regarding_a_paralleldataimporthandler. Though there is a ticket in JIRA, I haven't been able to contribute this back. If you think this is what you need, lemme know. Actually, I've already read this thread. In my opinion, both support for batch processing and multi-threading are important extensions of DIH's current capabilities, though issue SOLR-1352 mainly targets the latter. Is your PDIH implementation able to deal with batch processing right now? Best, Sascha On Thu, Nov 12, 2009 at 6:35 AM, Sascha Szott sz...@zib.de wrote: Hi all, I'm using the DIH in a parameterized way by passing request parameters that are used inside of my data-config. All imports end up in the same index. 1. Is it considered as good practice to set up several DIH request handlers, one for each possible parameter value? 2. In case the range of parameter values is broad, it's not convenient to define separate request handlers for each value. But this entails a limitation (as far as I see): It is not possible to fire several request to the same DIH handler (with different parameter values) at the same time. However, in case several request handlers would be used (as in 1.), concurrent requests (to the different handlers) are possible. So, how to overcome this limitation? Best, Sascha -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Type converters for DocumentObjectBinder
create a setter method for the field which take s a Stringand apply the annotation there example private Calendar validFrom; @Field public void setvalidFrom(String s){ //convert to Calendar object and set the field } On Fri, Nov 13, 2009 at 12:24 PM, paulhyo st...@ouestil.ch wrote: Hi, I would like to know if there is a way to add type converters when using getBeans. I need convertion when Updating (Calendar - String) and when Searching (String - Calendar) The Bean class defines : @Field private Calendar validFrom; but the recieved type within Query Response is a String (2009-11-13)... Actually I get this error : java.lang.RuntimeException: Exception while setting value : 2009-09-16 on private java.util.Calendar ch.mycompany.access.solr.impl.SoNatPersonImpl.validFrom at org.apache.solr.client.solrj.beans.DocumentObjectBinder$DocField.set(DocumentObjectBinder.java:360) at org.apache.solr.client.solrj.beans.DocumentObjectBinder$DocField.inject(DocumentObjectBinder.java:342) at org.apache.solr.client.solrj.beans.DocumentObjectBinder.getBeans(DocumentObjectBinder.java:55) at org.apache.solr.client.solrj.response.QueryResponse.getBeans(QueryResponse.java:324) at ch.mycompany.access.solr.impl.result.NatPersonPartnerResultBuilder.buildBeanListResult(NatPersonPartnerResultBuilder.java:38) at ch.mycompany.access.solr.impl.SoQueryManagerImpl.searchNatPersons(SoQueryManagerImpl.java:41) at ch.mycompany.access.solr.impl.SolrQueryManagerTest.testQueryFamilyNameRigg(SolrQueryManagerTest.java:36) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:164) at junit.framework.TestCase.runBare(TestCase.java:130) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:120) at junit.framework.TestSuite.runTest(TestSuite.java:230) at junit.framework.TestSuite.run(TestSuite.java:225) at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: java.lang.IllegalArgumentException: Can not set java.util.Calendar field ch.mycompany.access.solr.impl.SoNatPersonImpl.validFrom to java.lang.String at sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:146) at sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:150) at sun.reflect.UnsafeObjectFieldAccessorImpl.set(UnsafeObjectFieldAccessorImpl.java:63) at java.lang.reflect.Field.set(Field.java:657) at org.apache.solr.client.solrj.beans.DocumentObjectBinder$DocField.set(DocumentObjectBinder.java:354) ... 24 more Thank you in advance Paulhyo -- View this message in context: http://old.nabble.com/Type-converters-for-DocumentObjectBinder-tp26332174p26332174.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Data import problem with child entity from different database
no obvious issues. you may post your entire data-config.xml do w/o CachedSqlEntityProcessor first and then apply that later On Fri, Nov 13, 2009 at 4:38 PM, Andrew Clegg andrew.cl...@gmail.com wrote: Morning all, I'm having problems with joining child a child entity from one database to a parent from another... My entity definitions look like this (names changed for brevity): entity name=parent dataSource=db1 query=select a, b, c from parent_table entity name=child dataSource=db2 onError=continue query=select c, d from child_table where c = '${parent.c}' / /entity c is getting indexed fine (it's stored, I can see field 'c' in the search results) but child.d isn't. I know the child table has data for the corresponding parent rows, and I've even watched the SQL queries against the child table appearing in Oracle's sqldeveloper as the DataImportHandler runs. But no content for child.d gets into the index. My schema contains a definition for a field called d like so: field name=d type=keywords_ids indexed=true stored=true multiValued=true termVectors=true / (keywords_ids is a conservatively-analyzed text type which has worked fine in other contexts.) Two things occur to me. 1. db1 is PostgreSQL and db2 is Oracle, although the d field in both tables is just a char(4), nothing fancy. Could something weird with character encodings be happening? 2. d isn't a primary key in either parent or child, but this shouldn't matter should it? Additional data points -- I also tried using the CachedSqlEntityProcessor to do in-memory table caching of child, but it didn't work then either. I got a lot of error messages like this: No value available for the cache key : d in the entity : child If anyone knows whether this is a known limitation (if so I can work round it), or an unexpected case (if so I'll file a bug report), please shout. I'm using 1.4. Yet again, many thanks :-) Andrew. -- View this message in context: http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26334948.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: javabin in .NET?
The javabin format does not have many dependencies. it may have 3-4 classes an that is it. On Fri, Nov 13, 2009 at 6:05 PM, Mauricio Scheffer mauricioschef...@gmail.com wrote: Nope. It has to be manually ported. Not so much because of the language itself but because of differences in the libraries. 2009/11/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com Is there any tool to directly port java to .Net? then we can etxract out the client part of the javabin code and convert it. On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Has anyone looked into using the javabin response format from .NET (instead of SolrJ)? It's mainly a curiosity. How much better could performance/bandwidth/throughput be? How difficult would it be to implement some .NET code (C#, I'd guess being the best choice) to handle this response format? Thanks, Erik -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Data import problem with child entity from different database
am unable to get the file http://old.nabble.com/file/p26335171/dataimport.temp.xml On Fri, Nov 13, 2009 at 4:57 PM, Andrew Clegg andrew.cl...@gmail.com wrote: Noble Paul നോബിള് नोब्ळ्-2 wrote: no obvious issues. you may post your entire data-config.xml Here it is, exactly as last attempt but with usernames etc. removed. Ignore the comments and the unused FileDataSource... http://old.nabble.com/file/p26335171/dataimport.temp.xml dataimport.temp.xml Noble Paul നോബിള് नोब्ळ्-2 wrote: do w/o CachedSqlEntityProcessor first and then apply that later Yep, that was just a bit of a wild stab in the dark to see if it made any difference. Thanks, Andrew. -- View this message in context: http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26335171.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: javabin in .NET?
OK. Is there anyone trying it out? where is this code ? I can try to help .. On Fri, Nov 13, 2009 at 8:10 PM, Mauricio Scheffer mauricioschef...@gmail.com wrote: I meant the standard IO libraries. They are different enough that the code has to be manually ported. There were some automated tools back when Microsoft introduced .Net, but IIRC they never really worked. Anyway it's not a big deal, it should be a straightforward job. Testing it thoroughly cross-platform is another thing though. 2009/11/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com The javabin format does not have many dependencies. it may have 3-4 classes an that is it. On Fri, Nov 13, 2009 at 6:05 PM, Mauricio Scheffer mauricioschef...@gmail.com wrote: Nope. It has to be manually ported. Not so much because of the language itself but because of differences in the libraries. 2009/11/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com Is there any tool to directly port java to .Net? then we can etxract out the client part of the javabin code and convert it. On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Has anyone looked into using the javabin response format from .NET (instead of SolrJ)? It's mainly a curiosity. How much better could performance/bandwidth/throughput be? How difficult would it be to implement some .NET code (C#, I'd guess being the best choice) to handle this response format? Thanks, Erik -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: javabin in .NET?
For a client the marshal() part is not important.unmarshal() is probably all you need On Sun, Nov 15, 2009 at 12:36 AM, Mauricio Scheffer mauricioschef...@gmail.com wrote: Original code is here: http://bit.ly/hkCbI I just started porting it here: http://bit.ly/37hiOs It needs: tests/debugging, porting NamedList, SolrDocument, SolrDocumentList Thanks for any help! Cheers, Mauricio 2009/11/14 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com OK. Is there anyone trying it out? where is this code ? I can try to help .. On Fri, Nov 13, 2009 at 8:10 PM, Mauricio Scheffer mauricioschef...@gmail.com wrote: I meant the standard IO libraries. They are different enough that the code has to be manually ported. There were some automated tools back when Microsoft introduced .Net, but IIRC they never really worked. Anyway it's not a big deal, it should be a straightforward job. Testing it thoroughly cross-platform is another thing though. 2009/11/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com The javabin format does not have many dependencies. it may have 3-4 classes an that is it. On Fri, Nov 13, 2009 at 6:05 PM, Mauricio Scheffer mauricioschef...@gmail.com wrote: Nope. It has to be manually ported. Not so much because of the language itself but because of differences in the libraries. 2009/11/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com Is there any tool to directly port java to .Net? then we can etxract out the client part of the javabin code and convert it. On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Has anyone looked into using the javabin response format from .NET (instead of SolrJ)? It's mainly a curiosity. How much better could performance/bandwidth/throughput be? How difficult would it be to implement some .NET code (C#, I'd guess being the best choice) to handle this response format? Thanks, Erik -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: javabin in .NET?
start with a JavabinDecoder only so that the class is simple to start with. 2009/11/16 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: For a client the marshal() part is not important.unmarshal() is probably all you need On Sun, Nov 15, 2009 at 12:36 AM, Mauricio Scheffer mauricioschef...@gmail.com wrote: Original code is here: http://bit.ly/hkCbI I just started porting it here: http://bit.ly/37hiOs It needs: tests/debugging, porting NamedList, SolrDocument, SolrDocumentList Thanks for any help! Cheers, Mauricio 2009/11/14 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com OK. Is there anyone trying it out? where is this code ? I can try to help .. On Fri, Nov 13, 2009 at 8:10 PM, Mauricio Scheffer mauricioschef...@gmail.com wrote: I meant the standard IO libraries. They are different enough that the code has to be manually ported. There were some automated tools back when Microsoft introduced .Net, but IIRC they never really worked. Anyway it's not a big deal, it should be a straightforward job. Testing it thoroughly cross-platform is another thing though. 2009/11/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com The javabin format does not have many dependencies. it may have 3-4 classes an that is it. On Fri, Nov 13, 2009 at 6:05 PM, Mauricio Scheffer mauricioschef...@gmail.com wrote: Nope. It has to be manually ported. Not so much because of the language itself but because of differences in the libraries. 2009/11/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com Is there any tool to directly port java to .Net? then we can etxract out the client part of the javabin code and convert it. On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Has anyone looked into using the javabin response format from .NET (instead of SolrJ)? It's mainly a curiosity. How much better could performance/bandwidth/throughput be? How difficult would it be to implement some .NET code (C#, I'd guess being the best choice) to handle this response format? Thanks, Erik -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: DataImportHandler Questions-Load data in parallel and temp tables
On Mon, Nov 16, 2009 at 6:25 PM, amitj am...@ieee.org wrote: Is there also a way we can include some kind of annotation on the schema field and send the data retrieved for that field to an external application. We have a requirement where we require some data fields (out of the fields for an entity defined in data-config.xml) to act as entities for entity extraction and auto complete purposes and we are using some external application. No. it is not possible in Solr now. Noble Paul നോബിള് नोब्ळ् wrote: writing to a remote Solr through SolrJ is in the cards. I may even take it up after 1.4 release. For now your best bet is to override the class SolrWriter and override the corresponding methods for add/delete. 2009/4/27 Amit Nithian anith...@gmail.com: All, I have a few questions regarding the data import handler. We have some pretty gnarly SQL queries to load our indices and our current loader implementation is extremely fragile. I am looking to migrate over to the DIH; however, I am looking to use SolrJ + EmbeddedSolr + some custom stuff to remotely load the indices so that my index loader and main search engine are separated. Currently, unless I am missing something, the data gathering from the entity and the data processing (i.e. conversion to a Solr Document) is done sequentially and I was looking to make this execute in parallel so that I can have multiple threads processing different parts of the resultset and loading documents into Solr. Secondly, I need to create temporary tables to store results of a few queries and use them later for inner joins was wondering how to best go about this? I am thinking to add support in DIH for the following: 1) Temporary tables (maybe call it temporary entities)? --Specific only to SQL though unless it can be generalized to other sources. 2) Parallel support - Including some mechanism to get the number of records (whether it be count or the MAX(custom_id)-MIN(custom_id)) 3) Support in DIH or Solr to post documents to a remote index (i.e. create a new UpdateHandler instead of DirectUpdateHandler2). If any of these exist or anyone else is working on this (OR you have better suggestions), please let me know. Thanks! Amit -- - -- --Noble Paul -- View this message in context: http://old.nabble.com/DataImportHandler-Questions-Load-data-in-parallel-and-temp-tables-tp23266396p26371403.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: javabin in .NET?
On Mon, Nov 16, 2009 at 5:55 PM, Mauricio Scheffer mauricioschef...@gmail.com wrote: Yep, I think I mostly nailed the unmarshalling. Need more tests though. And then integrate it to SolrNet. Is there any way (or are there any plans) to have an update handler that accepts javabin? There is already one . look at BinaryRequestWriter. But I would say that may not make a lot of difference as indexing is a back-end operation and slight perf improvements won't make much difference. 2009/11/16 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com start with a JavabinDecoder only so that the class is simple to start with. 2009/11/16 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: For a client the marshal() part is not important.unmarshal() is probably all you need On Sun, Nov 15, 2009 at 12:36 AM, Mauricio Scheffer mauricioschef...@gmail.com wrote: Original code is here: http://bit.ly/hkCbI I just started porting it here: http://bit.ly/37hiOs It needs: tests/debugging, porting NamedList, SolrDocument, SolrDocumentList Thanks for any help! Cheers, Mauricio 2009/11/14 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com OK. Is there anyone trying it out? where is this code ? I can try to help .. On Fri, Nov 13, 2009 at 8:10 PM, Mauricio Scheffer mauricioschef...@gmail.com wrote: I meant the standard IO libraries. They are different enough that the code has to be manually ported. There were some automated tools back when Microsoft introduced .Net, but IIRC they never really worked. Anyway it's not a big deal, it should be a straightforward job. Testing it thoroughly cross-platform is another thing though. 2009/11/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com The javabin format does not have many dependencies. it may have 3-4 classes an that is it. On Fri, Nov 13, 2009 at 6:05 PM, Mauricio Scheffer mauricioschef...@gmail.com wrote: Nope. It has to be manually ported. Not so much because of the language itself but because of differences in the libraries. 2009/11/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com Is there any tool to directly port java to .Net? then we can etxract out the client part of the javabin code and convert it. On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Has anyone looked into using the javabin response format from .NET (instead of SolrJ)? It's mainly a curiosity. How much better could performance/bandwidth/throughput be? How difficult would it be to implement some .NET code (C#, I'd guess being the best choice) to handle this response format? Thanks, Erik -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: $DeleteDocbyQuery in solr 1.4 is not working
are you sure that the doc w/ the same id was not created after that? On Mon, Nov 16, 2009 at 11:12 PM, Mark Ellul m...@catalystic.com wrote: Hi, I have added a deleted field in my database, and am using the Dataimporthandler to add rows to the index... I am using solr 1.4 I have added my the deleted field to the query and the RegexTransformer... and the field definition below field column=$deleteDocByQuery regex=^true$ replaceWith=id:${List.id} sourceColName=deleted/ When I run the deltaImport command... I see the below output INFO: [] webapp=/solr path=/dataimport params={command=delta-importdebug=trueexpungeDeletes=true} status=0 QTime=1 Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.DataImporter doDeltaImport INFO: Starting Delta Import Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.DocBuilder doDelta INFO: Starting delta collection. Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running ModifiedRowKey() for Entity: List Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity List with URL: jdbc:postgresql://localhost:5432/tlists Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 4 Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: List rows obtained : 1 Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: List rows obtained : 0 Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed parentDeltaQuery for Entity: List Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.SolrWriter deleteByQuery INFO: Deleting documents from Solr with query: id:api__list__365522 Nov 16, 2009 5:29:10 PM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=1 commit{dir=/mnt/solr-index/index,segFN=segments_r,version=1257863009839,generation=27,filenames=[_bg.fdt, _bg.tii, segments_r, _bg.fnm, _bg.nrm, _bg.fdx, _bg.prx, _bg.tis, _bg.frq] Nov 16, 2009 5:29:10 PM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1257863009839 Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.DocBuilder doDelta INFO: Delta Import completed successfully It says its deleting the document... but when I do the search its still showing up Any Ideas? Regards Mark -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: $DeleteDocbyQuery in solr 1.4 is not working
why don't you add a new timestamp field . you can use the TemplateTransformer with the formatDate() function On Tue, Nov 17, 2009 at 5:49 PM, Mark Ellul m...@catalystic.com wrote: Hi Noble, Excellent Question... should the field that does the deleting be in a different entity to the one that does the addition and updating? If so that could be the issue, I have the field that does the DeleteByQuery command inside of the entity that does the adding. Is there some kind of document metadata where the create date and update date is show? How would I see this meta data if it exists? Regards Mark On 11/17/09, Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com wrote: The question is, did your new delta-import created the doc again? On Tue, Nov 17, 2009 at 4:41 PM, Mark Ellul m...@catalystic.com wrote: The doc already existed before the delta-import has been run. And it exists afterwards... even though it says its deleting it. Any ideas of what I can try? On 11/17/09, Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com wrote: are you sure that the doc w/ the same id was not created after that? On Mon, Nov 16, 2009 at 11:12 PM, Mark Ellul m...@catalystic.com wrote: Hi, I have added a deleted field in my database, and am using the Dataimporthandler to add rows to the index... I am using solr 1.4 I have added my the deleted field to the query and the RegexTransformer... and the field definition below field column=$deleteDocByQuery regex=^true$ replaceWith=id:${List.id} sourceColName=deleted/ When I run the deltaImport command... I see the below output INFO: [] webapp=/solr path=/dataimport params={command=delta-importdebug=trueexpungeDeletes=true} status=0 QTime=1 Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.DataImporter doDeltaImport INFO: Starting Delta Import Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.DocBuilder doDelta INFO: Starting delta collection. Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running ModifiedRowKey() for Entity: List Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity List with URL: jdbc:postgresql://localhost:5432/tlists Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 4 Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: List rows obtained : 1 Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: List rows obtained : 0 Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed parentDeltaQuery for Entity: List Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.SolrWriter deleteByQuery INFO: Deleting documents from Solr with query: id:api__list__365522 Nov 16, 2009 5:29:10 PM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=1 commit{dir=/mnt/solr-index/index,segFN=segments_r,version=1257863009839,generation=27,filenames=[_bg.fdt, _bg.tii, segments_r, _bg.fnm, _bg.nrm, _bg.fdx, _bg.prx, _bg.tis, _bg.frq] Nov 16, 2009 5:29:10 PM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1257863009839 Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.DocBuilder doDelta INFO: Delta Import completed successfully It says its deleting the document... but when I do the search its still showing up Any Ideas? Regards Mark -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Control DIH from PHP
you can pass the uniqueId as a param and use it in a sql query http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters. --Noble On Thu, Nov 19, 2009 at 3:53 PM, Pablo Ferrari pabs.ferr...@gmail.com wrote: Most specificly, I'm looking to update only one document using it's Unique ID: I dont want the DIH to lookup the whole database because I already know the Unique ID that has changed. Pablo 2009/11/19 Pablo Ferrari pabs.ferr...@gmail.com Hello! After been working in Solr documents updates using direct php code (using SolrClient class) I want to use the DIH (Data Import Handler) to update my documents. Any one knows how can I send commands to the DIH from php? Any idea or tutorial will be of great help because I'm not finding anything useful so far. Thank you for you time! Pablo Tinkerlabs -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: help with dataimport delta query
I guess the field names do not match in the deltaQuery you are selecting the field id and in the deltaImportQuery you us the field as ${dataimporter.delta.job_jobs_id} I guess it should be ${dataimporter.delta.id} On Tue, Nov 24, 2009 at 1:19 AM, Joel Nylund jnyl...@yahoo.com wrote: Hi, I have solr all working nicely, except im trying to get deltas to work on my data import handler Here is a simplification of my data import config, I have a table called Book which has categories, im doing subquries for the category info and calling a javascript helper. This all works perfectly for the regular query. I added these lines for the delta stuff: deltaImportQuery=SELECT f.id,f.title FROM Book f f.id='${dataimporter.delta.job_jobs_id}' deltaQuery=SELECT id FROM `Book` WHERE fm.inMyList=1 AND lastModifiedDate '${dataimporter.last_index_time}' basically im trying to rows that lastModifiedDate is newer than the last index (or deltaindex). I run: http://localhost:8983/solr/dataimport?command=delta-import And it says in logs: Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DataImporter doDeltaImport INFO: Starting Delta Import Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder doDelta INFO: Starting delta collection. Nov 23, 2009 2:33:02 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=delta-import} status=0 QTime=0 Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running ModifiedRowKey() for Entity: category Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: category rows obtained : 0 Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: category rows obtained : 0 Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed parentDeltaQuery for Entity: category Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running ModifiedRowKey() for Entity: item Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: item rows obtained : 0 Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: item rows obtained : 0 Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed parentDeltaQuery for Entity: item Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder doDelta INFO: Delta Import completed successfully Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder execute INFO: Time taken = 0:0:0.21 But the browser says no documents added/modified (even though one record in db is a match) Is there a way to turn debugging so I can see the queries the DIH is sending to the db? Any other ideas of what I could be doing wrong? thanks Joel document name=doc entity name=item query=SELECT f.id, f.title FROM Book f WHERE f.inMyList=1 deltaImportQuery=SELECT f.id,f.title FROM Book f f.id='${dataimporter.delta.job_jobs_id}' deltaQuery=SELECT id FROM `Book` WHERE fm.inMyList=1 AND lastModifiedDate '${dataimporter.last_index_time}' field column=id name=id / field column=title name=title / entity name=category transformer=script:SplitAndPrettyCategory query=select fc.bookId, group_concat(cr.name) as categoryName, from BookCat fc where fc.bookId = '${item.id}' AND group by fc.bookId field column=categoryType name=categoryType / /entity /entity /document -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Multicore - Post xml to core0, core1 or core2
try this java -Durl=http://localhost:8983/solr/core0/update -jar post.jar *.xml On Wed, Nov 25, 2009 at 3:23 PM, Jörg Agatz joerg.ag...@googlemail.com wrote: Hallo, at the moment i tryed to create a Solr instance wite more then one Cores I use solr 1.4 and multicore Runs :-) But i dont know how i post a XML in one of my cores. At the Moment i use java -jar post.jar *.xml now i will fill the core0 index with core0*.xml , and core1 with core1*.xml But how? in the wiki i cant find anythink about that. King -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: How to avoid hardcoding masterUrl in slave solrconfig.xml?
remove the lst name=slave section from your solrconfig. It should be fine On Tue, Dec 1, 2009 at 6:59 AM, William Pierce evalsi...@hotmail.com wrote: Hi, Joe: I tried with the fetchIndex all lower-cased, and still the same result. What do you specify for masterUrl in the solrconfig.xml on the slave? it seems to me that if I remove the element, I get the exception I wrote about. If I set it to some dummy url, then I get an invalid url message when I run the command=details on the slave replication handler. What I am doing does not look out of the ordinary. I want to control the masterurl and the time of replication by myself. As such I want neither the masterUrl nor the polling interval in the config file. Can you share relevant snippets of your config file and the exact url your code is generating? Thanks, - Bill -- From: Joe Kessel isjust...@hotmail.com Sent: Monday, November 30, 2009 3:45 PM To: solr-user@lucene.apache.org Subject: RE: How to avoid hardcoding masterUrl in slave solrconfig.xml? I do something very similar and it works for me. I noticed on your URL that you have a mixed case fetchIndex, which the request handler is checking for fetchindex, all lowercase. If it is not that simple I can try to see the exact url my code is generating. Hope it helps, Joe From: evalsi...@hotmail.com To: solr-user@lucene.apache.org Subject: Re: How to avoid hardcoding masterUrl in slave solrconfig.xml? Date: Mon, 30 Nov 2009 13:48:38 -0800 Folks: Sorry for this repost! It looks like this email went out twice Thanks, - Bill -- From: William Pierce evalsi...@hotmail.com Sent: Monday, November 30, 2009 1:47 PM To: solr-user@lucene.apache.org Subject: How to avoid hardcoding masterUrl in slave solrconfig.xml? Folks: I do not want to hardcode the masterUrl in the solrconfig.xml of my slave. If the masterUrl tag is missing from the config file, I am getting an exception in solr saying that the masterUrl is required. So I set it to some dummy value, comment out the poll interval element, and issue a replication command manually like so: http://localhost:port/postings/replication?command=fetchIndexmasterUrl=http://localhost:port/postingsmaster/replication Now no internal exception, solr responds with a status OK for the above request, the tomcat logs show no error but the index is not replicated. When I issue the details command to the slave, I see that it ignored the masterUrl on the command line but instead complains that the master url in the config file (which I had set to a dummy value) is not correct. (Just fyi, I have tried sending in the masterUrl to the above command with url encoding and also without. in both cases, I got the same result.) Show exactly do I avoid hardcoding the masterUrl in the config file? Any pointers/help will be greatly appreciated! - Bill _ Bing brings you maps, menus, and reviews organized in one place. http://www.bing.com/search?q=restaurantsform=MFESRPpubl=WLHMTAGcrea=TEXT_MFESRP_Local_MapsMenu_Resturants_1x1 -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Thought that masterUrl in slave solrconfig.xml is optional...
remove the slave section completely and startup will go thru fine On Tue, Dec 1, 2009 at 2:47 AM, William Pierce evalsi...@hotmail.com wrote: Folks: Reading the wiki, I saw the following statement: Force a fetchindex on slave from master command : http://slave_host:port/solr/replication?command=fetchindex It is possible to pass on extra attribute 'masterUrl' or other attributes like 'compression' (or any other parameter which is specified in the lst name=slave tag) to do a one time replication from a master. This obviates the need for hardcoding the master in the slave. In my case, I cannot hardcode the masterurl in the config file. I want a cron job to issue the replication commands for each of the slaves. So I issued the following command: http://localhost/postings/replication?command=fetchIndexmasterUrl=http%3a%2f%2flocalhost%2fpostingsmaster I got the following exception: HTTP Status 500 - Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in null - org.apache.solr.common.SolrException: 'masterUrl' is required for a slave at org.apache.solr.handler.SnapPuller.init(SnapPuller.java:126) at other lines removed Why is error message asking me to specify the masterUrl in the config file when the wiki states that this is optional? Or, am I understanding this incorrectly? Thanks, - Bill -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: 'Connection reset' in DataImportHandler Development Console
The debug tool for DIH dires queries in sync mode. it waits fro the import to complete for the page to show up. If the process takes long you r likely to see the connection reset message. For get about debug. what exactly do you want to do? 2009/8/17 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: apparently I do not see any command full-import, delta-import being fired. Is that true? On Mon, Aug 17, 2009 at 5:55 PM, Andrew Cleggandrew.cl...@gmail.com wrote: Hi folks, I'm trying to use the Debug Now button in the development console to test the effects of some changes in my data import config (see attached). However, each time I click it, the right-hand frame fails to load -- it just gets replaced with the standard 'connection reset' message from Firefox, as if the server's dropped the HTTP connection. Everything else seems okay -- I can run queries in Solr Admin without any problems, and all the other buttons in the dev console work -- status, document count, reload config etc. There's nothing suspicious in Tomcat's catalina.out either. If I hit Reload Config, then Status, then Debug Now, I get this: 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImportHandler processConfiguration INFO: Processing configuration from solrconfig.xml: {config=dataconfig.xml} 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImporter loadDataConfig INFO: Data Configuration loaded successfully 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImporter verifyWithSchema INFO: id is a required field in SolrSchema . But not found in DataConfig 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImporter verifyWithSchema INFO: title is a required field in SolrSchema . But not found in DataConfig 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImporter verifyWithSchema INFO: doc_type is a required field in SolrSchema . But not found in DataConfig 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImporter verifyWithSchema INFO: id is a required field in SolrSchema . But not found in DataConfig 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImporter verifyWithSchema INFO: title is a required field in SolrSchema . But not found in DataConfig 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImporter verifyWithSchema INFO: doc_type is a required field in SolrSchema . But not found in DataConfig 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImporter verifyWithSchema INFO: id is a required field in SolrSchema . But not found in DataConfig 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImporter verifyWithSchema INFO: title is a required field in SolrSchema . But not found in DataConfig 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImporter verifyWithSchema INFO: doc_type is a required field in SolrSchema . But not found in DataConfig 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImporter verifyWithSchema INFO: id is a required field in SolrSchema . But not found in DataConfig 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImporter verifyWithSchema INFO: title is a required field in SolrSchema . But not found in DataConfig 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImporter verifyWithSchema INFO: doc_type is a required field in SolrSchema . But not found in DataConfig 17-Aug-2009 13:12:12 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={clean=falsecommand=reload-configcommit=trueqt=/dataimport} status=0 QTime=5 17-Aug-2009 13:12:21 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={clean=falsecommand=statuscommit=trueqt=/dataimport} status=0 QTime=0 (The warnings are because the doc_type field comes out of the JDBC result set automatically by column name -- this isn't a problem.) Also, there's no entry in the Tomcat access log for the debug request either, just the first two: [17/Aug/2009:13:12:12 +0100] HTTP/1.1 cookie:- request:- GET /solr/select 200 ?clean=falsecommit=trueqt=%2Fdataimportcommand=reload-config GET /solr/select?clean=falsecommit=t rueqt=%2Fdataimportcommand=reload-config HTTP/1.1 [17/Aug/2009:13:12:21 +0100] HTTP/1.1 cookie:- request:- GET /solr/select 200 ?clean=falsecommit=trueqt=%2Fdataimportcommand=status GET /solr/select?clean=falsecommit=trueqt= %2Fdataimportcommand=status HTTP/1.1 PS... Nightly build, 30th of July. Thanks, Andrew. http://www.nabble.com/file/p25005850/dataconfig.xml dataconfig.xml -- View this message in context: http://www.nabble.com/%27Connection-reset%27-in-DataImportHandler-Development-Console-tp25005850p25005850.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Response writer configs
I guess we should remove this commented response writers from the example solrconfig. It adds no value. On Wed, Dec 2, 2009 at 9:38 AM, Erik Hatcher erik.hatc...@gmail.com wrote: On Dec 1, 2009, at 9:04 PM, Ross wrote: I'm starting to play with Solr. This might be a silly question and not particularly important but I'm curious. I setup the example site using the tutorial. It works very well. I was looking around the config files and notice that in my solrconfig.xml that the queryResponseWriter area is commented out but they all still work. wt=php etc returns the php format. How is it working if they're not defined? Are they defined elsewhere? Good question. Solr defines (in SolrCore) a default set of response writers. static{ HashMapString, QueryResponseWriter m= new HashMapString, QueryResponseWriter(); m.put(xml, new XMLResponseWriter()); m.put(standard, m.get(xml)); m.put(json, new JSONResponseWriter()); m.put(python, new PythonResponseWriter()); m.put(php, new PHPResponseWriter()); m.put(phps, new PHPSerializedResponseWriter()); m.put(ruby, new RubyResponseWriter()); m.put(raw, new RawResponseWriter()); m.put(javabin, new BinaryResponseWriter()); DEFAULT_RESPONSE_WRITERS = Collections.unmodifiableMap(m); } Note that these built in ones can be overridden, but not undefined, by registering a response writer with the same name as a built in one. Erik -- - Noble Paul | Principal Engineer| AOL | http://aol.com