#36747: parse_duration() fails to parse valid ISO-8601 durations including 
years,
months, and weeks due to incorrect regex
--------------------------------+--------------------------------------
     Reporter:  florianvazelle  |                    Owner:  (none)
         Type:  Bug             |                   Status:  new
    Component:  Uncategorized   |                  Version:  5.2
     Severity:  Normal          |               Resolution:
     Keywords:                  |             Triage Stage:  Unreviewed
    Has patch:  0               |      Needs documentation:  0
  Needs tests:  0               |  Patch needs improvement:  0
Easy pickings:  0               |                    UI/UX:  0
--------------------------------+--------------------------------------
Description changed by florianvazelle:

Old description:

> == **Description**
>
> `django.utils.dateparse.parse_duration()` claims to support ISO-8601
> duration strings, but the current implementation only handles the
> `PnDTnHnMnS` subset. Valid ISO-8601 components such as years (`Y`),
> months (`M`), and weeks (`W`) are not supported.
>
> The internal regex used for ISO-8601 durations:
>
> {{{
> iso8601_duration_re = _lazy_re_compile(
>     r"^(?P<sign>[-+]?)"
>     r"P"
>     r"(?:(?P<days>\d+([.,]\d+)?)D)?"
>     r"(?:T"
>     r"(?:(?P<hours>\d+([.,]\d+)?)H)?"
>     r"(?:(?P<minutes>\d+([.,]\d+)?)M)?"
>     r"(?:(?P<seconds>\d+([.,]\d+)?)S)?"
>     r")?"
>     r"$"
> )
> }}}
>
> This means Django currently rejects valid duration strings such as:
>

> {{{
> P1Y2M
> P3W
> P1Y
> P2M10DT2H
> }}}
>

>
> Despite the documentation suggesting ISO-8601 support, these forms cannot
> be parsed.
>
> == **Steps to Reproduce**
>
> {{{
> from django.utils.dateparse import parse_duration
>
> parse_duration("P1Y")      # returns None
> parse_duration("P2M")      # returns None
> parse_duration("P3W")      # returns None
> parse_duration("P1Y2M3DT4H")  # returns None
> }}}
>
> == **Expected Behavior**
>
> `parse_duration()` should parse all valid ISO-8601 durations that can be
> represented as `timedelta`, including:
>
> * `PnW`
> * `PnYnMnD`
> * ...
>
> Django should capture each calendar unit, or clearly state limitations.
>
> == **Proposed Fix**
>
> 1. Replace the current `iso8601_duration_re` with one that uses distinct
> group names for each ISO-8601 calendar unit:
>

>
> {{{
> iso8601_duration_re = _lazy_re_compile(
>     r"^(?P<sign>[-+]?)"
>     r"P"
>     r"(?:(?P<years>\d+([.,]\d+)?)Y)?"
>     r"(?:(?P<months>\d+([.,]\d+)?)M)?"
>     r"(?:(?P<weeks>\d+([.,]\d+)?)W)?"
>     r"(?:(?P<days>\d+([.,]\d+)?)D)?"
>     r"(?:T"
>     r"(?:(?P<hours>\d+([.,]\d+)?)H)?"
>     r"(?:(?P<minutes>\d+([.,]\d+)?)M)?"
>     r"(?:(?P<seconds>\d+([.,]\d+)?)S)?"
>     r")?"
>     r"$"
> )
> }}}
>

>
> 2. Extend `parse_duration()` to convert these new fields to timedelta.
>

> {{{
> def parse_duration(value):
>     match = (
>         standard_duration_re.match(value)
>         or iso8601_duration_re.match(value)
>         or postgres_interval_re.match(value)
>     )
>     if match:
>         kw = match.groupdict()
>         sign = -1 if kw.pop("sign", "+") == "-" else 1
>         if kw.get("microseconds"):
>             kw["microseconds"] = kw["microseconds"].ljust(6, "0")
>         kw = {k: float(v.replace(",", ".")) for k, v in kw.items() if v
> is not None}
>         days = datetime.timedelta(kw.pop("days", 0.0) or 0.0)
>
>         if match.re == iso8601_duration_re:
> +           years = kw.pop("years", 0.0)
> +           months = kw.pop("months", 0.0)
> +           weeks = kw.pop("weeks", 0.0)
> +
> +           days = datetime.timedelta(years=years, months=months,
> days=kw.pop("days", 0.0) + (weeks * 7))
>             days *= sign
>
>         return days + sign * datetime.timedelta(**kw)
> }}}
>

> I can provide a full patch (tests + implementation) if desired.

New description:

 == **Description**

 `django.utils.dateparse.parse_duration()` claims to support ISO-8601
 duration strings, but the current implementation only handles the
 `PnDTnHnMnS` subset. Valid ISO-8601 components such as years (`Y`), months
 (`M`), and weeks (`W`) are not supported.

 The internal regex used for ISO-8601 durations:

 {{{
 iso8601_duration_re = _lazy_re_compile(
     r"^(?P<sign>[-+]?)"
     r"P"
     r"(?:(?P<days>\d+([.,]\d+)?)D)?"
     r"(?:T"
     r"(?:(?P<hours>\d+([.,]\d+)?)H)?"
     r"(?:(?P<minutes>\d+([.,]\d+)?)M)?"
     r"(?:(?P<seconds>\d+([.,]\d+)?)S)?"
     r")?"
     r"$"
 )
 }}}

 This means Django currently rejects valid duration strings such as:


 {{{
 P1Y2M
 P3W
 P1Y
 P2M10DT2H
 }}}



 Despite the documentation suggesting ISO-8601 support, these forms cannot
 be parsed.

 == **Steps to Reproduce**

 {{{
 from django.utils.dateparse import parse_duration

 parse_duration("P1Y")      # returns None
 parse_duration("P2M")      # returns None
 parse_duration("P3W")      # returns None
 parse_duration("P1Y2M3DT4H")  # returns None
 }}}

 == **Expected Behavior**

 `parse_duration()` should parse all valid ISO-8601 durations that can be
 represented as `timedelta`, including:

 * `PnW`
 * `PnYnMnD`
 * ...

 Django should capture each calendar unit, or clearly state limitations.

--
-- 
Ticket URL: <https://code.djangoproject.com/ticket/36747#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/django-updates/0107019aa0e0934f-1cdba3fd-984b-4d64-91fb-f88e0c47dd02-000000%40eu-central-1.amazonses.com.

Reply via email to