#36747: parse_duration() fails to parse valid ISO-8601 durations including
years,
months, and weeks due to incorrect regex
--------------------------------+--------------------------------------
Reporter: florianvazelle | Owner: (none)
Type: Bug | Status: new
Component: Uncategorized | Version: 5.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------+--------------------------------------
Description changed by florianvazelle:
Old description:
> == **Description**
>
> `django.utils.dateparse.parse_duration()` claims to support ISO-8601
> duration strings, but the current implementation only handles the
> `PnDTnHnMnS` subset. Valid ISO-8601 components such as years (`Y`),
> months (`M`), and weeks (`W`) are not supported.
>
> The internal regex used for ISO-8601 durations:
>
> {{{
> iso8601_duration_re = _lazy_re_compile(
> r"^(?P<sign>[-+]?)"
> r"P"
> r"(?:(?P<days>\d+([.,]\d+)?)D)?"
> r"(?:T"
> r"(?:(?P<hours>\d+([.,]\d+)?)H)?"
> r"(?:(?P<minutes>\d+([.,]\d+)?)M)?"
> r"(?:(?P<seconds>\d+([.,]\d+)?)S)?"
> r")?"
> r"$"
> )
> }}}
>
> This means Django currently rejects valid duration strings such as:
>
> {{{
> P1Y2M
> P3W
> P1Y
> P2M10DT2H
> }}}
>
>
> Despite the documentation suggesting ISO-8601 support, these forms cannot
> be parsed.
>
> == **Steps to Reproduce**
>
> {{{
> from django.utils.dateparse import parse_duration
>
> parse_duration("P1Y") # returns None
> parse_duration("P2M") # returns None
> parse_duration("P3W") # returns None
> parse_duration("P1Y2M3DT4H") # returns None
> }}}
>
> == **Expected Behavior**
>
> `parse_duration()` should parse all valid ISO-8601 durations that can be
> represented as `timedelta`, including:
>
> * `PnW`
> * `PnYnMnD`
> * ...
>
> Django should capture each calendar unit, or clearly state limitations.
>
> == **Proposed Fix**
>
> 1. Replace the current `iso8601_duration_re` with one that uses distinct
> group names for each ISO-8601 calendar unit:
>
>
> {{{
> iso8601_duration_re = _lazy_re_compile(
> r"^(?P<sign>[-+]?)"
> r"P"
> r"(?:(?P<years>\d+([.,]\d+)?)Y)?"
> r"(?:(?P<months>\d+([.,]\d+)?)M)?"
> r"(?:(?P<weeks>\d+([.,]\d+)?)W)?"
> r"(?:(?P<days>\d+([.,]\d+)?)D)?"
> r"(?:T"
> r"(?:(?P<hours>\d+([.,]\d+)?)H)?"
> r"(?:(?P<minutes>\d+([.,]\d+)?)M)?"
> r"(?:(?P<seconds>\d+([.,]\d+)?)S)?"
> r")?"
> r"$"
> )
> }}}
>
>
> 2. Extend `parse_duration()` to convert these new fields to timedelta.
>
> {{{
> def parse_duration(value):
> match = (
> standard_duration_re.match(value)
> or iso8601_duration_re.match(value)
> or postgres_interval_re.match(value)
> )
> if match:
> kw = match.groupdict()
> sign = -1 if kw.pop("sign", "+") == "-" else 1
> if kw.get("microseconds"):
> kw["microseconds"] = kw["microseconds"].ljust(6, "0")
> kw = {k: float(v.replace(",", ".")) for k, v in kw.items() if v
> is not None}
> days = datetime.timedelta(kw.pop("days", 0.0) or 0.0)
>
> if match.re == iso8601_duration_re:
> + years = kw.pop("years", 0.0)
> + months = kw.pop("months", 0.0)
> + weeks = kw.pop("weeks", 0.0)
> +
> + days = datetime.timedelta(years=years, months=months,
> days=kw.pop("days", 0.0) + (weeks * 7))
> days *= sign
>
> return days + sign * datetime.timedelta(**kw)
> }}}
>
> I can provide a full patch (tests + implementation) if desired.
New description:
== **Description**
`django.utils.dateparse.parse_duration()` claims to support ISO-8601
duration strings, but the current implementation only handles the
`PnDTnHnMnS` subset. Valid ISO-8601 components such as years (`Y`), months
(`M`), and weeks (`W`) are not supported.
The internal regex used for ISO-8601 durations:
{{{
iso8601_duration_re = _lazy_re_compile(
r"^(?P<sign>[-+]?)"
r"P"
r"(?:(?P<days>\d+([.,]\d+)?)D)?"
r"(?:T"
r"(?:(?P<hours>\d+([.,]\d+)?)H)?"
r"(?:(?P<minutes>\d+([.,]\d+)?)M)?"
r"(?:(?P<seconds>\d+([.,]\d+)?)S)?"
r")?"
r"$"
)
}}}
This means Django currently rejects valid duration strings such as:
{{{
P1Y2M
P3W
P1Y
P2M10DT2H
}}}
Despite the documentation suggesting ISO-8601 support, these forms cannot
be parsed.
== **Steps to Reproduce**
{{{
from django.utils.dateparse import parse_duration
parse_duration("P1Y") # returns None
parse_duration("P2M") # returns None
parse_duration("P3W") # returns None
parse_duration("P1Y2M3DT4H") # returns None
}}}
== **Expected Behavior**
`parse_duration()` should parse all valid ISO-8601 durations that can be
represented as `timedelta`, including:
* `PnW`
* `PnYnMnD`
* ...
Django should capture each calendar unit, or clearly state limitations.
--
--
Ticket URL: <https://code.djangoproject.com/ticket/36747#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/django-updates/0107019aa0e0934f-1cdba3fd-984b-4d64-91fb-f88e0c47dd02-000000%40eu-central-1.amazonses.com.