stevedlawrence commented on a change in pull request #253: Support terminators
and initiators containing %ES;
URL: https://github.com/apache/incubator-daffodil/pull/253#discussion_r298582978
##########
File path:
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/parsers/DelimiterParsers.scala
##########
@@ -74,57 +74,20 @@ class DelimiterTextParser(
return false
}
- private def findES(delimIter: DelimiterIterator): Maybe[DFADelimiter] = {
- delimIter.reset()
- while (delimIter.hasNext()) {
- val d = delimIter.next()
- if (d.lookingFor == "%ES;") {
- return One(d)
- }
- }
- Nope
- }
-
- private def findLocalES(state: PState): Maybe[DFADelimiter] = {
- val delimIter = new LocalTypedDelimiterIterator(delimiterType,
state.mpstate.delimiters, state.mpstate.delimitersLocalIndexStack)
- findES(delimIter)
- }
+ override def parse(start: PState): Unit = {
- private def findRemoteES(state: PState): Maybe[DFADelimiter] = {
- val delimIter = new RemoteTypedDelimiterIterator(delimiterType,
state.mpstate.delimiters, state.mpstate.delimitersLocalIndexStack)
- findES(delimIter)
- }
+ val maybeDelimIter =
+ if (delimiterType == DelimiterTextType.Terminator && !isDelimited) {
+ Maybe(new LocalTypedDelimiterIterator(delimiterType,
start.mpstate.delimiters, start.mpstate.delimitersLocalIndexStack.top))
+ } else if (delimiterType == DelimiterTextType.Initiator ||
!start.delimitedParseResult.isDefined) {
+ Maybe(new
RemoteTerminatingMarkupAndLocalTypedDelimiterIterator(delimiterType,
start.mpstate.delimiters, start.mpstate.delimitersLocalIndexStack.top))
+ } else {
+ Nope
+ }
- override def parse(start: PState): Unit = {
val foundDelimiter =
- if (delimiterType == DelimiterTextType.Initiator ||
!start.delimitedParseResult.isDefined) {
- //
- // We are going to scan for delimiter text.
- //
- // FIXME: This is incorrect. It grabs all local delimiters even if
we're last in a group so the
Review comment:
Although I've confirmed that this bug does still exist, I think this comment
is wrong, or at least it's in the wrong place. The purpose of DelimiterParser
to to just consume delimiters, and it appears to do scana/consume the correct
delimiters correctly. For example, if I take the above schema and make all the
lengths explicit, I've confirmed that it scans for what you would expect--"foo"
scans only for a semicolon, "bar" scans for both the semicolon and comma, and
"baz" scans for just the comma (no scanning is required for "quux").
However, I have confirmed that things fail when lengths are delimited. In
fact, no actual scanning even happens in DelmiterParser in that case. All that
scanning is done in StringDelimitedParser, which caches the result and then
DelimiterParser just determines if i found an out of scope delimiter. So
there's definitely a bug in StringDelimitedParser scanning for the wrong
delimters. There might still be a bug somewhere in this parser, but that's
where things start to go off the rails. And it's possible something is broken
with how we build the delimiter stack.
So I think the issues are fundamental and spread out, and not just contained
in this parser. So I'm not sure these comment belongs here any more, and I
rather just have this information in a bug where we'll see it and keep track of
it.
I'll add these two test (explicit lengths and delimited length scanning) and
open new bug that has all the relevant information.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services