Re: [PATCH] udf: Fix regression which is preventing symlink access

Daniel Kiper Thu, 16 Sep 2021 14:35:06 -0700

On Wed, Sep 15, 2021 at 10:52:40PM +0000, Glenn Washburn wrote:
> On Wed, 15 Sep 2021 16:52:28 +0200
> Daniel Kiper <dki...@net-space.pl> wrote:
>
> > On Tue, Sep 14, 2021 at 06:19:03PM +0000, Glenn Washburn wrote:
> > > On Tue, 14 Sep 2021 16:27:55 +0200
> > > Daniel Kiper <dki...@net-space.pl> wrote:
> > >
> > > > On Fri, Sep 10, 2021 at 04:03:23PM +0000, Glenn Washburn wrote:
> > > > > This code was broken by commit 3f05d693 ("malloc: Use overflow
> > > > > checking primitives where we do complex allocations"), which
> > > > > added overflow checking in many areas. The problem here is that
> > > > > the changes update the local variable sz, which was already in
> > > > > use and which was not updated before the change. So the code
> > > > > using sz was getting a different value of than it would have
> > > > > previously for the same UDF image. This causes the logic
> > > > > getting the destination of the symlink to not realize that its
> > > > > gotten the full destination, but keeps trying to read past the
> > > > > end of the destination. The bytes after the end are generally
> > > > > NULL padding bytes, but that's not a valid component type
> > > > > (ECMA-167 14.16.1.1). So grub_udf_read_symlink branches to
> > > > > error logic, returning NULL, instead of the symlink destination
> > > > > path.
> > > > >
> > > > > The result of this bug is that the UDF filesystem tests were
> > > > > failing in the symlink test with the grub-fstest error message:
> > > > >
> > > > >     grub-fstest: error: cannot open `(loop0)/sym': invalid
> > > > > symlink.
> > > > >
> > > > > This change stores the result of doubling sz in another local
> > > > > variable s, so as not to modify sz. Also remove unnecessary
> > > > > grub_add, which increased the output by 1 to account for a NULL
> > > > > byte. This isn't needed because an output buffer of size twice
> > > > > sz is already guaranteed to be more than enough to contain the
> > > > > path components converted to UTF-8. The worst case upper- bound
> > > > > for the needed output buffer size is (sz-4)*1.5, where 4 is the
> > > > > size
> > > >
> > > > I think 4 comes from ECMA-167 spec. Could you add a reference to
> > > > it here? The number of paragraph would be perfect...
> > >
> > > Its 14.16.1 basically in the same place as the reference earlier,
> > > which is why I didn't include it. But, yes, I can include it.
> >
> > Yes, please.
>
> Ok, will do.


Thanks!

> > > > > of a path component header and 1.5 is the maximum growth in
> > > > > bytes when converting from 2-byte unicode code-points to UTF-8
> > > > > (from 2 bytes to 3).
> > > >
> > > > Could you explain how did you come up with the 1.5 value? It
> > > > would be nice if you refer to a spec or something like that.
> > >
> > > There is no spec that I know of (but would be happy to know of
> > > one). Its something I've deduced based on my understanding of
> > > Unicode, UTF-8, and UTF-16. All unicode code points less than or
> > > equal to 2 bytes (code points <0x10000) can be represented in UTF-8
> > > by a maximum of 3 bytes [1]. Longer UTF-16 encodings don't matter
> > > because those will be 4 bytes or longer. The maximum number of
> > > bytes for a UTF-8 encoding of a unicode
> >
> > The [1] says: Since Errata DCN-5157, the range of code points was
> > expanded to all code points from Unicode 4.0 (or any newer or older
> > version), which includes Plane 1-16 characters such as Emoji.
> >
> > So, I think your assumption about longer encodings is incorrect for
> > the UDF.
>
> No, I don't believe so. The "assumption", which was actually a logical
> conclusion, that I understand you to be saying is incorrect is "Longer
> UTF-16 encodings don't matter because those will be 4 bytes or longer".

Yep...

> This was perhaps a little confusing as there are no UTF-16 encoded
> codepoints longer than 4 bytes. Be that as it may, I go on to explain
> that they do not matter because those UTF-16 bytes strings will never
> occupy more bytes when encoded in UTF-8. The question of debate is "how
> much can a UTF-16 byte string grow when re-encoded as UTF-8?". Thus
> codepoints that can not grow (most in fact strink!) can be disregarded
> in the analysis. You talk about unicode Planes 1-16 as relevant, but
> remember those Planes are the ones where the UTF-16 is 4 bytes, and
> thus the set of code words we just disregarded as not relevant.
>
> Perhaps you do not believe that all codepoints outside of Plane 0
> require no more bytes in UTF-16 than in UTF-8. If so, please provide one
> counter example, that is one codepoint satisfying the condition.
> Likewise, if you believe that there exists a 2-byte codepoint in UTF-16
> which occupies 4-bytes in UTF-8, a counter example would help to make
> your case.
>
> This explanation may provide more clarity on the matter[1].

OK, now everything is clear. It seams to me that "Longer UTF-16 encodings
don't matter because those will be 4 bytes or longer" sentence should be
rephrased a bit to make it less confusing. I think "don't matter" is
especially confusing here. At least it confused me. Additionally, I think
taking some text from the [1] would help too.

Anyway, please polish the commit messages then I will get this patch.

Daniel

[1] https://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings#Efficiency

_______________________________________________
Grub-devel mailing list
Grub-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/grub-devel

Re: [PATCH] udf: Fix regression which is preventing symlink access

Reply via email to