Re: [lldb-dev] [llvm-dev] Adding DWARF5 accelerator table support to llvm

Adrian Prantl via lldb-dev Tue, 30 Jan 2018 08:21:47 -0800


> On Jan 30, 2018, at 7:49 AM, Pavel Labath <[email protected]> wrote:
> 
> On 30 January 2018 at 15:41, Adrian Prantl <[email protected]> wrote:
>> 
>> 
>>> On Jan 30, 2018, at 7:35 AM, Pavel Labath <[email protected]> wrote:
>>> 
>>> Hello all,
>>> 
>>> I am looking for feedback regarding implementation of the case folding
>>> algorithm for .debug_names hashes.
>>> 
>>> Unlike the apple tables, the .debug_names hashes are computed from
>>> case-folded names (to enable case-insensitive lookups for languages
>>> where that makes sense). The dwarf5 document specifies that the case
>>> folding should be done according the the "Caseless matching" Section
>>> of the Unicode standard (whose implementation is basically a long list
>>> of special cases). While certainly possible, implementing this would
>>> be much more complicated (and would probably make the code a bit
>>> slower) than a simple tolower(3) call. And the benefits of this are
>>> not really clear to me.
>> 
>> Assuming a UTF-8 encoding, will tolower(3) destroy any non-ASCII characters 
>> in the process? In Swift, for example, we allow a wide range of unicode 
>> characters in identifiers and I want to make sure that this doesn't cause 
>> any problems.
>> 
> 
> I'm not sure what it will do out-of-the-box, but I could certainly
> implement it such that it does not touch the fancy characters.
> 
> However, if we already have unicode characters in the input, then it
> may make sense to go all the way and implement the full folding
> algorithm. Because, once we start producing hashes like this, it will
> be hard to switch to being fully standard-compliant (as that would
> invalidate the existing hashes).
> 
> But the question then is: can I assume the input names will be unicode
> (w/utf8 encoding)?


We can make that happen and encode it explicitly in each compile unit:

> 3.1.1 Full and Partial Compilation Unit Entries
> ...
> A DW_AT_use_UTF8 attribute, which is a flag whose presence indicates that all 
> strings (such as the names of declared entities in the source program, or 
> filenames in the line number table) are represented using the UTF-8 
> representation. 

-- adrian
_______________________________________________
lldb-dev mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Re: [lldb-dev] [llvm-dev] Adding DWARF5 accelerator table support to llvm

Reply via email to