Mike Beckerle created DAFFODIL-2504:
---------------------------------------

             Summary: Parse text of non-specified length from TCP hangs 
unnecessarily
                 Key: DAFFODIL-2504
                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2504
             Project: Daffodil
          Issue Type: Bug
          Components: Back End
    Affects Versions: 3.0.0
            Reporter: Mike Beckerle


See tests

{color:#00627a}testDaffodilParseFromNetworkDelimited1{color}

{color:#00627a}testDaffodilParseFromNetworkDelimited1b{color}

{color:#00627a}testDaffodilParseFromNetworkDelimited2{color}

{color:#00627a}testDaffodilParseFromNetworkDelimited2b{color}

{color:#00627a}When parsing text from a network TCP stream, the parse should 
succeed once the parser knows it has matched the longest possible delimiter. It 
should not require more than that many characters to be present on the data 
stream in order for the parse to complete. {color}

{color:#00627a}There are no tests as yet, but presumably lengthKind 'pattern' 
will have a similar issue where only enough characters should be needed to 
provide the knowably longest match for the regex. (For example, suppose 
dfdl:lengthPattern="." which is looking for exactly 1 byte. The match of this 
should NOT require that more than one byte be available on the TCP stream. 
{color}

{color:#00627a}The arbitrary size 8 of the CharBuffer in 
InputSourceDataInputStream leads to this requiring around 8 characters of look 
ahead beyond the last character matched to the delimiter. Resizing this to 2 
allows tests to succeed with fewer lookahead characters, but really the whole 
approach/algorithm needs to be examined to really consider the lookahead, and 
if it can be avoided in many cases.{color}

{color:#00627a}It is known that you can't always avoid looking ahead 1 
character. {color}{color:#00627a}For matching delimiters that use DFDL 
Character Class Entities that can match a variable number of characters (e.g., 
WSP+, WSP*, and NL) a lookahead of 1 is clearly necessary to know if the match 
is complete. {color}

{color:#00627a}For matching regular expressions, since they can lookahead an 
arbitrary finite distance, the amount of lookahead required depends on the 
specific regex. {color}

{color:#00627a}Since some amount of look ahead is required in these cases, 
fixing this issue for the simpler situation of just basic delimiters with a 
fixed number of characters seems relatively low priority. {color}

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to