[ 
https://issues.apache.org/jira/browse/TIKA-3695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509069#comment-17509069
 ] 

Tim Allison edited comment on TIKA-3695 at 3/18/22, 8:31 PM:
-------------------------------------------------------------

Hmmm...not able to reproduce locally, and that is the right commit! 

{code:java}
java -jar 
~/Intellij/tika-main/tika-server/tika-server-standard/target/tika-server-standard-2.3.1-SNAPSHOT.jar
 -c config.xml
{code}

{code:java}
curl -H "writeLimit:1000" -T huge-title.docx localhost:9998/rmeta/text
{code}

Result is:
{code:java}
[{"X-TIKA:EXCEPTION:metadata_limit_reached":"true","X-TIKA:Parsed-By":"org.apache.tika.parser.DefaultPa","X-TIKA:content":"\n\n\n\n\n\n\nTitle
 is 
huge.\n","Content-Type":"application/vnd.openxmlformats-officedocument.wordprocessingml.document"}]
{code}

I get the same if I bump the writelimit to your size (not that it should 
matter).


was (Author: [email protected]):
Hmmm...not able to reproduce locally: 

{code:java}
java -jar 
~/Intellij/tika-main/tika-server/tika-server-standard/target/tika-server-standard-2.3.1-SNAPSHOT.jar
 -c config.xml
{code}

{code:java}
curl -H "writeLimit:1000" -T huge-title.docx localhost:9998/rmeta/text
{code}

Result is:
{code:java}
[{"X-TIKA:EXCEPTION:metadata_limit_reached":"true","X-TIKA:Parsed-By":"org.apache.tika.parser.DefaultPa","X-TIKA:content":"\n\n\n\n\n\n\nTitle
 is 
huge.\n","Content-Type":"application/vnd.openxmlformats-officedocument.wordprocessingml.document"}]
{code}

I get the same if I bump the writelimit to your size (not that it should 
matter).

> LimitingMetadataFilter
> ----------------------
>
>                 Key: TIKA-3695
>                 URL: https://issues.apache.org/jira/browse/TIKA-3695
>             Project: Tika
>          Issue Type: New Feature
>          Components: metadata
>    Affects Versions: 1.28.1, 2.3.0
>            Reporter: Julien Massiera
>            Priority: Major
>             Fix For: 2.3.1
>
>         Attachments: huge-title.docx, tika-config.xml
>
>
> Some files may contain abnormally big metadata (several MB, be it for the 
> metadata values, the metadata names, but also for the total amount of 
> metadata) that can be problematic concerning the memory consumption.
> It would be great to develop a new LimitingMetadataFilter so that we can 
> filter out the metadata according to different bytes limits (on metadata 
> names, metadata values and global amount of metadata) 
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to