On Fri, 2023-07-21 at 19:08 -0400, Lewis Hyatt wrote:
> Add a new linemap reason LC_GEN which enables encoding the location
> of data
> that was generated during compilation and does not appear in any
> source file.
> There could be many use cases, such as, for instance, referring to
> the content
> of builtin macros (not yet implemented, but an easy lift after this
> one.) The
> first intended application is to create a place to store the input to
> a
> _Pragma directive, so that proper locations can be assigned to those
> tokens. This will be done in a subsequent commit.
>
> The actual change needed to the line-maps API in libcpp is not too
> large and
> requires no space overhead in the line map data structures (on 64-bit
> systems
> that is; one newly added data member to class line_map_ordinary sits
> inside
> former padding bytes.) An LC_GEN map is just an ordinary map like any
> other,
> but the TO_FILE member that normally points to the file name points
> instead to
> the actual data. This works automatically with PCH as well, for the
> same
> reason that the file name makes its way into a PCH. In order to
> avoid
> confusion, the member has been renamed from TO_FILE to DATA, and
> associated
> accessors adjusted.
>
> Outside libcpp, there are many small changes but most of them are to
> selftests, which are necessarily more sensitive to implementation
> details. From the perspective of the user (the "user", here, being a
> frontend
> using line maps or else the diagnostics infrastructure), the chief
> visible
> change is that the function location_get_source_line() should be
> passed an
> expanded_location object instead of a separate filename and line
> number. This
> is not a big change because in most cases, this information came
> anyway from a
> call to expand_location and the needed expanded_location object is
> readily
> available. The new overload of location_get_source_line() uses the
> extra
> information in the expanded_location object to obtain the data from
> the
> in-memory buffer when it originated from an LC_GEN map.
>
> Until the subsequent patch that starts using LC_GEN maps, none are
> yet
> generated within GCC, hence nothing is added to the testsuite here;
> but all
> relevant selftests have been extended to cover generated data maps in
> addition
> to normal files.
[..snip...]
Thanks for the updated patch.
Reading this patch, it felt a bit unnatural to me to have an
(exploded location, source line)
pair where the exploded location seems to be representing "which source
file or generated buffer", but the line/column info in that
exploded_location is to be ignored in favor of the 2nd source line.
I think we're missing a class: something that identifies either a
specific source file, or a specific generated buffer.
How about something like either:
class source_id
{
public:
source_id (const char *filename)
: m_filename_or_buffer (filename),
m_len (0)
{
}
explicit source_id (const char *buffer, unsigned buffer_len)
: m_filename_or_buffer (buffer),
m_len (buffer_len)
{
linemap_assert (buffer_len > 0);
}
private:
const char *m_filename_or_buffer;
unsigned m_len; // where 0 means "it's a filename"
};
or:
class source_id
{
public:
source_id (const char *filename)
: m_ptr (filename),
m_is_buffer (false)
{
}
explicit source_id (const linemap_ordinary *buffer_linemap)
: m_ptr (buffer_linemap),
m_is_buffer (true)
{
}
private:
const void *m_ptr;
bool m_is_buffer;
};
and use one of these "source_id file" in place of "const char *file",
rather than replacing such things with expanded_location?
> diff --git a/gcc/c-family/c-indentation.cc b/gcc/c-family/c-indentation.cc
> index e8d3dece770..4164fa0b1ba 100644
> --- a/gcc/c-family/c-indentation.cc
> +++ b/gcc/c-family/c-indentation.cc
> @@ -50,7 +50,7 @@ get_visual_column (expanded_location exploc,
> unsigned int *first_nws,
> unsigned int tab_width)
> {
> - char_span line = location_get_source_line (exploc.file, exploc.line);
> + char_span line = location_get_source_line (exploc);
...so this might contine to be:
char_span line = location_get_source_line (exploc.file, exploc.line);
...but expanded_location's "file" field would become a source_id,
rather than a const char *. It looks like doing do might make a lot of
"is this the same file or buffer?" turn into comparisons of source_id
instances.
So I think expanded_location would become:
typedef struct
{
/* Either the name of the source file involved, or the
specific generated buffer. */
source_id file;
/* The line-location in the source file. */
int line;
int column;
void *data;
/* In a system header?. */
bool sysp;
} expanded_location;
and we wouldn't need to add these extra fields:
> +
> + /* If generated data, the data and its length. The data may contain
> embedded
> + nulls and need not be null-terminated. */
> + unsigned int generated_data_len;
> + const char *generated_data;
> } expanded_location;
and we could pass around source_id instances when identifying specific
filenames/generated buffers.
Does this idea simplify/clarify the patch, or make it more complicated?
[...snip...]
Thoughts?
Dave