Hi Arthi,

I should have noticed this in my first response. When sending binary
document you need to use the "--data-binary @{file}" instad of
"--data" as "--data" is a shorthand for "--data-ascii". I made a short
test on

    http://dev.iks-project.eu:8081/enhancer/chain/dbpedia-proper-noun

with "--data" and a PDF document and got an empty response. WIth
"--data-binary" I got the expected results.

If this does not solve your problems you should try to remove the
"optional" flag from the "tika" engine in your chain, because this
would cause the enhancement process to fail in such cases. If "tika"
is marked as optional errors are only logged and the processing is
continued.

We had some some issues with the Tika engine related to XML based
office documents (e.g. STANBOL-810, STANBOL-970) but as PDF files do
also not work for you I expect that your issues are caused by
something different.

Feel free to test also on the dev.iks-project.eu server. e.g.

    http://dev.iks-project.eu:8081/enhancer
    http://dev.iks-project.eu:8081/enhancer/chain/dbpedia-proper-noun

best
Rupert

On Sat, Sep 14, 2013 at 11:36 AM,  <arthi.ven...@wipro.com> wrote:
> Hi Rupert,
>   Tried  the different mime types but with no luck.
> Same call on a plain text  or just data works fine.
> For example below command returns the enhancements
>
> curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" --data 
> @FileToAugment.txt http://localhost:8080/enhancer/chain/MyCustomChain1
>
> However none of the below three commands give a response.
> curl -X POST -H "Accept: text/turtle" -H "Content-type: application/msword" 
> --data @TextToEnhance97ver.doc 
> "http://localhost:8080/enhancer/chain/MyCustomChain1";
>
> curl -X POST -H "Accept: text/turtle" -H "Content-type: 
> application/vnd.openxmlformats-officedocument.wordprocessingml.document" 
> --data @TextToEnhance.docx 
> "http://localhost:8080/enhancer/chain/MyCustomChain1";
>
> curl -X POST -H "Accept: text/turtle" -H "Content-type: application/pdf" 
> --data @Testpdf.pdf "http://localhost:8080/enhancer/chain/MyCustomChain1";
>
> Please do share any pointers which you have on this.
> Note : MyCustomChain1  has below configuration :
>
>     tika ( optional , TikaEngine)
>     langdetect ( required , LanguageDetectionEnhancementEngine)
>     opennlp-sentence ( required , OpenNlpSentenceDetectionEngine)
>     opennlp-token ( required , OpenNlpTokenizerEngine)
>     opennlp-pos ( required , OpenNlpPosTaggingEngine)
>     opennlp-chunker ( required , OpenNlpChunkingEngine)
>     MyLinkingEngine ( required , EntityLinkingEngine)
>
>
> Thanking you and Regards,
> Arthi
>
>
>
>
> -----Original Message-----
> From: arthi venkataraman (WT01 - CTO Office)
> Sent: Saturday, September 14, 2013 12:49 PM
> To: dev@stanbol.apache.org
> Subject: RE: Exception while installing metaxa
>
> Thanks a lot Rupert
> I will check the Content type  and re-try.
>
> Thanks and Rgds,
> Arthi
>
>
> -----Original Message-----
> From: Rupert Westenthaler [mailto:rupert.westentha...@gmail.com]
> Sent: Saturday, September 14, 2013 12:46 PM
> To: dev@stanbol.apache.org
> Subject: Re: Exception while installing metaxa
>
> On Sat, Sep 14, 2013 at 9:05 AM,  <arthi.ven...@wipro.com> wrote:
>> Thanks a lot Rupert for response.
>> The reason for using Metaxa is that I want to enhance pdf and word documents 
>> using Stanbol.
>> I read that to process pdf and word we would need metaxa in the pipeline.
>
> The Tika Engine is also able to process Microsoft Word and PDF document. Just 
> have a look at the supported media types of Apache Tika.
>
>>
>> In the contenhub ui of Stanbol Iam able to attach a pdf / word doc and 
>> enhance this.
>> However when I try the same from the command line using curl it fails.
>>
>> Any idea how I could use Stanbol to enhance a word / pdf file from command 
>> line or alternately a simple Java program.
>>
>> Tried below calls but none of them work curl -i -X POST -H
>> "Content-Type:text/plain" --data @TextToEnhance.docx "
>> http://localhost:8080/contenthub/contenthub/store?uri=urn:my-content-i
>> tem2&chain=MyCustomChain1"  -u admin:admin
>>
>>  curl -i -X POST -H "Content-Type:application/word" --data   
>> @TextToEnhance.docx " 
>> http://localhost:8080/contenthub/contenthub/store?uri=urn:my-content-item2&chain=MyCustomChain1";
>>  -u admin:admin"
>>
>
> I need to go offline and do not have time to validate this answer, but IMO 
> this fails because the content type for docx is not application/word. See [1] 
> for a list of Content-Types for the new XML based MS office formats
>
> best
> Rupert
>
> [1] 
> http://stackoverflow.com/questions/4212861/what-is-a-correct-mime-type-for-docx-pptx-etc
>
>>
>> Thanks and Rgds,
>> Arthi
>>
>>
>> -----Original Message-----
>> From: Rupert Westenthaler [mailto:rupert.westentha...@gmail.com]
>> Sent: Saturday, September 14, 2013 12:27 PM
>> To: dev@stanbol.apache.org
>> Subject: Re: Exception while installing metaxa
>>
>> Hi Arthi,
>>
>> I have had not use Metaxa for a while. Typically you should use the Tika 
>> engine [1] (based on Apache Tika) for processing non plain text documents.
>>
>> To use it (with the default configuration) it is usually sufficient to 
>> include "tika" in your enhancement engine. If you are configuring a 
>> ListChain you will need to have the "tika" engine in the first place.
>> In case of a WightedChain ordering in the config does not matter.
>>
>> best
>> Rupert
>>
>>
>> [1]
>> http://stanbol.apache.org/docs/trunk/components/enhancer/engines/tikae
>> ngine
>>
>> On Fri, Sep 13, 2013 at 1:45 PM,  <arthi.ven...@wipro.com> wrote:
>>> Hi,
>>>    Iam trying to installing the metaxa bundle.
>>>
>>> I did a mvn clean install in the stanbol\enhancement-engines\metaxa 
>>> directory.
>>> From  the http://localhost:8080/system/console/bundles menu I installed the 
>>> metaxa jar.
>>>
>>> I got the below exception in the Stanbol window.   Any idea  how this issue 
>>> can be fixed?
>>>
>>> ERROR: Bundle org.apache.stanbol.enhancer.engines.metaxa [253]: Error
>>> starting/s topping bundle. (org.osgi.framework.BundleException:
>>> Unresolved constraint in bu ndle org.apache.stanbol.enhancer.engines.metaxa 
>>> [253]: Unable to resolve 253.0:
>>> missing requirement [253.0] package; (package=javax.microedition.io))
>>> org.osgi.framework.BundleException: Unresolved constraint in bundle
>>> org.apache.s tanbol.enhancer.engines.metaxa [253]: Unable to resolve
>>> 253.0: missing requireme nt [253.0] package; (package=javax.microedition.io)
>>>         at org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
>>>         at org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
>>>         at
>>> org.apache.felix.framework.Felix.setBundleStartLevel(Felix.java:1333)
>>>
>>>         at
>>> org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:270
>>> )
>>>         at java.lang.Thread.run(Unknown Source)
>>>
>>>
>>>
>>> Thanks and Rgds,
>>> Arthi
>>>
>>>
>>> Please do not print this email unless it is absolutely necessary.
>>>
>>> The information contained in this electronic message and any attachments to 
>>> this message are intended for the exclusive use of the addressee(s) and may 
>>> contain proprietary, confidential or privileged information. If you are not 
>>> the intended recipient, you should not disseminate, distribute or copy this 
>>> e-mail. Please notify the sender immediately and destroy all copies of this 
>>> message and any attachments.
>>>
>>> WARNING: Computer viruses can be transmitted via email. The recipient 
>>> should check this email and any attachments for the presence of viruses. 
>>> The company accepts no liability for any damage caused by any virus 
>>> transmitted by this email.
>>>
>>> www.wipro.com
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westentha...@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>> Please do not print this email unless it is absolutely necessary.
>>
>> The information contained in this electronic message and any attachments to 
>> this message are intended for the exclusive use of the addressee(s) and may 
>> contain proprietary, confidential or privileged information. If you are not 
>> the intended recipient, you should not disseminate, distribute or copy this 
>> e-mail. Please notify the sender immediately and destroy all copies of this 
>> message and any attachments.
>>
>> WARNING: Computer viruses can be transmitted via email. The recipient should 
>> check this email and any attachments for the presence of viruses. The 
>> company accepts no liability for any damage caused by any virus transmitted 
>> by this email.
>>
>> www.wipro.com
>
>
>
> --
> | Rupert Westenthaler             rupert.westentha...@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>
> Please do not print this email unless it is absolutely necessary.
>
> The information contained in this electronic message and any attachments to 
> this message are intended for the exclusive use of the addressee(s) and may 
> contain proprietary, confidential or privileged information. If you are not 
> the intended recipient, you should not disseminate, distribute or copy this 
> e-mail. Please notify the sender immediately and destroy all copies of this 
> message and any attachments.
>
> WARNING: Computer viruses can be transmitted via email. The recipient should 
> check this email and any attachments for the presence of viruses. The company 
> accepts no liability for any damage caused by any virus transmitted by this 
> email.
>
> www.wipro.com



-- 
| Rupert Westenthaler             rupert.westentha...@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Reply via email to