[ 
https://issues.apache.org/jira/browse/METRON-545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Zeolla updated METRON-545:
------------------------------
    Description: Due to a limitation with using lucene where an individual term 
cannot be larger than 32766 bytes (assuming UTF-8 encoding, this is 8,191 
characters), and assuming that we cannot easily identify the field datatype per 
the intent of the user (string vs integer vs ...), we should truncate fields if 
they are larger than 32766.  This should be somewhat rare, but even in cases 
where it occurs we can leverage the dual storage (HDFS and Lucene), integrity 
checking fields (METRON-544), and customizability of the UI (METRON-195) in 
order to retrieve the full original field value.  (was: Due to a limitation 
with using lucene where an individual term cannot be larger than 32766, and 
assuming that we cannot easily identify the field datatype per the intent of 
the user (string vs integer vs ...), we should truncate fields if they are 
larger than 32766.  This should be somewhat rare, but even in cases where it 
occurs we can leverage the dual storage (HDFS and Lucene), integrity checking 
fields (METRON-544), and customizability of the UI (METRON-195) in order to 
retrieve the full original field value.)

> Truncate fields larger than 32766
> ---------------------------------
>
>                 Key: METRON-545
>                 URL: https://issues.apache.org/jira/browse/METRON-545
>             Project: Metron
>          Issue Type: Sub-task
>            Reporter: Jon Zeolla
>            Priority: Minor
>
> Due to a limitation with using lucene where an individual term cannot be 
> larger than 32766 bytes (assuming UTF-8 encoding, this is 8,191 characters), 
> and assuming that we cannot easily identify the field datatype per the intent 
> of the user (string vs integer vs ...), we should truncate fields if they are 
> larger than 32766.  This should be somewhat rare, but even in cases where it 
> occurs we can leverage the dual storage (HDFS and Lucene), integrity checking 
> fields (METRON-544), and customizability of the UI (METRON-195) in order to 
> retrieve the full original field value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to