Hmmm, I'm not sure this fits into Solr-445 or not, could you add this comment to that patch discussion so we at least look?
Thanks, Erick On Thu, Apr 28, 2011 at 2:03 AM, Shinichiro Abe (JIRA) <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/SOLR-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026137#comment-13026137 > ] > > Shinichiro Abe commented on SOLR-2480: > -------------------------------------- > > Improvement ideas: > 1, TikaException is always ignored, and index only the metadata. > 2, Parameter "ignoreTikaException" is provided newly. > If it is true then it returns 200 response, if it is false then it throws > TikaException. > 3, If Solr can catch internal exception about encrypting error, it changes > return code each exception. > If it can judge poi.EncryptedDocumentException, > pdfbox.exceptions.CryptographyException. etc. then it returns 200 or another > code response, if it judges the other exception then it throws TikaException. > >> Text extraction of password protected files >> ------------------------------------------- >> >> Key: SOLR-2480 >> URL: https://issues.apache.org/jira/browse/SOLR-2480 >> Project: Solr >> Issue Type: Improvement >> Components: contrib - Solr Cell (Tika extraction) >> Affects Versions: 3.1 >> Reporter: Shinichiro Abe >> Priority: Minor >> >> Proposal: >> There are password-protected files. PDF, Office documents in 2007 format/97 >> format. >> These files are posted using SolrCell. >> We do not have to read these files if we do not know the reading password of >> files. >> So, these files may not be extracted text. >> My requirement is that these files should be processed normally without >> extracting text, and without throwing exception. >> This background: >> Now, when you post a password-protected file, solr returns 500 server error. >> Solr catches the error in ExtractingDocumentLoader and throws TikException. >> I use ManifoldCF. >> If the solr server responds 500, ManifoldCF judge is that "this >> document should be retried because I have absolutely no idea what >> happened". >> And it attempts to retry posting many times without getting the password. >> In the other case, my customer posts the files with embedded images. >> Sometimes it seems that solr throws TikaException of unknown cause. >> He wants to post just metadata without extracting text, but makes him stop >> posting by the exception. > > -- > This message is automatically generated by JIRA. > For more information on JIRA, see: http://www.atlassian.com/software/jira > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
