Re: [Joda-interest] Feature request: CharSequence instead of Strings for parsing…

Viktor Hedefalk Mon, 11 Jul 2011 13:47:41 -0700

Ok, thanks!

I could change my parser combinator to look like this:


trait DateParsers extends RegexParsers {
  def dateTime(pattern: String) = new Parser[LocalDate] {
    val dateFormat = DateTimeFormatters.pattern(pattern)

    def jodaParse(text: CharSequence, offset: Int) = {
      val parsePosition = new ParsePosition(offset)
      val result = dateFormat.parse(text, parsePosition)
      val date = () => result.toCalendricalMerger().getDate(false)
      (date, parsePosition)
    }

    def apply(in: Input) = {
      val source = in.source
      val offset = in.offset
      val start = handleWhiteSpace(source, offset)
      val (date, parsePosition) = jodaParse(source, start)
      if (parsePosition.getErrorIndex >= 0)
          Failure("Failed to parse date", in.drop(start - offset))
      else
          Success(date(), in.drop(parsePosition.getIndex - offset))
    }
  }
}

to get a LocalDate. I'll look into ThreeTen a bit more to see what
I'll use exactly, but anyways it seems to work - my enormous bulk of
tests from that old project passes :) :


class DateParserSpec extends FlatSpec with ShouldMatchers with DateParsers {
  "A DateParser" should "fail on weird output" in {
    val result = parseAll(dateTime("EEE MMM d HH:mm:ss"), "   xxx")
    result.successful should be(false)
  }
  it should "run fine on a matching date" in {
    val result = parseAll(dateTime("EEE MMM d HH:mm:ss yyyy Z"), "Wed
Aug 12 18:49:56 2009 +0200")
    result.successful should be(true)
  }
  it should "run fine on another matching date" in {
        val result = parseAll(dateTime("HH:ss:mm yyyy MMM"), "  22:56:23 2010 
Jan  ")
        result.successful should be (true)
  }
}


 I still really think that the call to toString() is a problem. Sure a
CharSequence is a random access datastructure but it is still very
usable from a streaming point of view as long as you don't look ahead
to much. This is exactly how the scala combinator parsers work. Look
at for instance PagedSeq:

http://www.scala-lang.org/archives/downloads/distrib/files/nightly/docs/library/scala/collection/immutable/PagedSeq$.html

This is a lazily evaluated sequence a that stores the elements in
pages of fixed length arrays. And I can do:

val input = PagedSeqReader(PagedSeq.fromFile(new
File("my/path/file.txt")) and use this lazy character stream straight
away in my combinator parser.

The toString method turns it to eager:

/** Convert sequence to string */
  override def toString = {
    val buf = new StringBuilder
    for (ch <- PagedSeq.this.iterator) buf append ch
    buf.toString
  }

which would make my date parsing-wrapper unusable from say a stream
over a network or a really large file.

I've looked at it again and to me it doesn't seem too hard to fix:
https://github.com/hedefalk/threeten/commit/517c1bcc6d7c4982f90a41781506d2616e9772f4
- tests pass.

But then again I might be biased since I think this one is very important ;)

The biggest issue is regionMatches i guess and I had to introduce a
Util-class again. If I were to issue a pull request, what would be
your preferred way of handling that one?

Thanks,
Viktor


On Mon, Jul 11, 2011 at 8:53 PM, Stephen Colebourne
<scolebou...@joda.org> wrote:
> On 11 July 2011 19:21, Viktor Hedefalk <hedef...@gmail.com> wrote:
>> I guess that the method that could be possible to use in ThreeTen
>> would be this one?
>>    public DateTimeParseContext parse(CharSequence text, ParsePosition
>> position) {
>
> If you need the ParsePosition, then that is the one.
>
>> This line hurts
>>
>>  // parse a String as its a better API for parser writers
>>  String str = text.toString();
>>
>> since it will be the entire input I'm parsing but I guess it probably
>> works in practice in my case, I'll have to try it out and get back.
>
> Thats the current choice I'm making, CharSequence outside, String inside.
>
>> Just of curiosity, what is it in String that makes it easier for parser 
>> writers?
>
> Its just a bigger API, with startsWith, contains, indexOf,
> regionMatches ... I tried to convert it to CharSeq internally, but it
> seemed like more hassle than it was worth. If you can convince me its
> really a major hassle or performance issue, then I might accept a pull
> request, but I'd prefer not to if possible.
>
> Stephen
>
> ------------------------------------------------------------------------------
> All of the data generated in your IT infrastructure is seriously valuable.
> Why? It contains a definitive record of application performance, security
> threats, fraudulent activity, and more. Splunk takes this data and makes
> sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-d2d-c2
> _______________________________________________
> Joda-interest mailing list
> Joda-interest@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/joda-interest
>

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Joda-interest mailing list
Joda-interest@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/joda-interest

Re: [Joda-interest] Feature request: CharSequence instead of Strings for parsing…

Reply via email to