I was being facetious about the enhancements.  I am more serious about bugs.  I 
would really rather use a maintained rather than a "I got this source at some 
point and use it for some things, but don't rigorously test it" version of 
regcomp & friends.  Can we hold off on that change till we have nothing else 
better to do?

Jim
> On Sep 21, 2016, at 6:27 PM, Zachary Turner <ztur...@google.com> wrote:
> 
> I don't believe so. For the same reason we don't want enhanced on Apple and 
> extended everywhere else. Better if all platforms do the same thing.
> 
> There may be a case to be made for standardizing on std::regex though, it has 
> many different flavors and has been standardized for some time.
> 
> Maybe llvm::Regex could be rewritten in terms of std::regex, that would 
> enable all the cool flavors for everyone (but I imagine it would tickle some 
> strange compatibility bugs and not be entirely smooth)
> 
> 
> On Wed, Sep 21, 2016 at 6:20 PM Jim Ingham <jing...@apple.com> wrote:
> So does the build machinery use the one that is supported on that platform if 
> it is available?  They are going to get much wider testing, fixes and even 
> "ENHANCE"ments...
> 
> Jim
> 
> > On Sep 21, 2016, at 6:07 PM, Zachary Turner <ztur...@google.com> wrote:
> >
> > Windows :)
> >
> > But that was before std::regex. In theory std::regex would work everywhere 
> > (although idk how it performs)
> > On Wed, Sep 21, 2016 at 6:04 PM Jim Ingham <jing...@apple.com> wrote:
> > It seems a little odd that llvm has its own forked copy of Henry Spencer's 
> > old regular expression engine?  Are there platforms we care about that 
> > don't have a maintained version of exactly the same code?
> >
> > Jim
> >
> >
> > > On Sep 21, 2016, at 5:42 PM, Zachary Turner <ztur...@google.com> wrote:
> > >
> > > I'll try to address the Regex issue this week (by making everything use 
> > > llvm extended mode regexes).  If someone feels up to the challenge, 
> > > adding support for \d \s etc etc to llvm's regex implementation would 
> > > make a lot of people very happy.
> > >
> > > On Wed, Sep 21, 2016 at 5:38 PM Jim Ingham <jing...@apple.com> wrote:
> > >
> > > > On Sep 21, 2016, at 5:25 PM, Greg Clayton via lldb-commits 
> > > > <lldb-commits@lists.llvm.org> wrote:
> > > >
> > > > Yep, and we can't have any regex objects in LLDB using those because 
> > > > they will only work on Apple and we don't want code like:
> > > >
> > > > #if defined(__APPLE__)
> > > >   RegularExpression e("\s+");
> > > > #else
> > > >   RegularExpression e("[ \t]+");
> > > > #endif
> > > >
> > > > I know "\s" is probably extended, so this was a bad example, but you 
> > > > get my drift.
> > >
> > > Nope, sadly for extended you have to say [[:space:]].  To avoid the 
> > > #define problem, we could require only extended regular expressions in 
> > > lldb code, but let users type the more convenient enhanced one wherever 
> > > the command line uses them.  But beyond these convenient shortcuts there 
> > > probably aren't that many additions that would be useful for the kind of 
> > > regular expressions our users are likely to write.  If we ever provide 
> > > "-r" option to "memory find" the byte literal extension might come in 
> > > handy.  But I don't think that justifies making MacOS builds different.
> > >
> > > If it really bothered us we could go get a more modern regex engine from 
> > > somewhere (rip it out of Tcl or something like that...)
> > >
> > > Jim
> > >
> > >
> > > >
> > > > Greg
> > > >
> > > >> On Sep 21, 2016, at 5:19 PM, Zachary Turner <ztur...@google.com> wrote:
> > > >>
> > > >> That sounds like a plan.  BTW, extended is the one that everyone 
> > > >> supports, enhanced is the one that only apple supports.
> > > >>
> > > >> On Wed, Sep 21, 2016 at 5:18 PM Greg Clayton <gclay...@apple.com> 
> > > >> wrote:
> > > >> And we should check for any "extended" mode regex stuff and get rid of 
> > > >> it since as you said they are not portable. They tend to be shortcuts 
> > > >> for classes of objects and we can just fix the regex to be more 
> > > >> explicit about it. For now we would keep the 
> > > >> lldb_private::RegularExpression class, have it have a llvm::Regex 
> > > >> member and then lldbassert if the regex fails to compile so we can 
> > > >> catch any extended cruft that we might miss so we can fix it...
> > > >>
> > > >>> On Sep 21, 2016, at 5:15 PM, Greg Clayton <gclay...@apple.com> wrote:
> > > >>>
> > > >>> To be clear: if we can make StringRef work efficiently, I am fine 
> > > >>> with that. We can just cut over to using llvm::Regex where it uses 
> > > >>> the start and end pointer. I would just like to avoid making string 
> > > >>> copies for no reason. I don't have anything against using StringRef 
> > > >>> as long as we do it efficiently.
> > > >>>
> > > >>> Greg
> > > >>>
> > > >>>
> > > >>>> On Sep 21, 2016, at 5:13 PM, Greg Clayton via lldb-commits 
> > > >>>> <lldb-commits@lists.llvm.org> wrote:
> > > >>>>
> > > >>>>>
> > > >>>>> On Sep 21, 2016, at 4:43 PM, Zachary Turner <ztur...@google.com> 
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>> You need to duplicate something on the heap once when you execute 
> > > >>>>> the regex.  And in turn you save tens or hundreds or copies on the 
> > > >>>>> way there because of inefficient string usage.
> > > >>>>
> > > >>>> Where? please show this.
> > > >>>>
> > > >>>> I see the following callers:
> > > >>>>
> > > >>>>        const char *class_name =
> > > >>>>            iterator->second->GetClassName().AsCString("<unknown>");
> > > >>>>        if (regex_up && class_name &&
> > > >>>>            !regex_up->Execute(llvm::StringRef(class_name)))
> > > >>>>
> > > >>>> You are adding a strlen() call here to construct the StringRef, not 
> > > >>>> saving anything.
> > > >>>>
> > > >>>>
> > > >>>> bool CommandObjectRegexCommand::DoExecute(const char *command,
> > > >>>>                                        CommandReturnObject &result) {
> > > >>>> if (command) {
> > > >>>>  EntryCollection::const_iterator pos, end = m_entries.end();
> > > >>>>  for (pos = m_entries.begin(); pos != end; ++pos) {
> > > >>>>    RegularExpression::Match regex_match(m_max_matches);
> > > >>>>
> > > >>>>    if (pos->regex.Execute(command, &regex_match)) {
> > > >>>>      std::string new_command(pos->command);
> > > >>>>      std::string match_str;
> > > >>>>
> > > >>>> No copy saved. Just wasted time with strlen in StringRef constructor.
> > > >>>>
> > > >>>>
> > > >>>>  DataVisualization::Categories::ForEach(
> > > >>>>      [&regex, &result](const lldb::TypeCategoryImplSP &category_sp) 
> > > >>>> -> bool {
> > > >>>>        if (regex) {
> > > >>>>          bool escape = true;
> > > >>>>          if (regex->GetText() == category_sp->GetName()) {
> > > >>>>            escape = false;
> > > >>>>          } else if (regex->Execute(llvm::StringRef::withNullAsEmpty(
> > > >>>>                         category_sp->GetName()))) {
> > > >>>>            escape = false;
> > > >>>>          }
> > > >>>>
> > > >>>>          if (escape)
> > > >>>>            return true;
> > > >>>>        }
> > > >>>>
> > > >>>> No copy saved. Just wasted time with strlen in StringRef constructor.
> > > >>>>
> > > >>>>
> > > >>>>    TypeCategoryImpl::ForEachCallbacks<FormatterType> foreach;
> > > >>>>    foreach
> > > >>>>      .SetExact([&result, &formatter_regex, &any_printed](
> > > >>>>                    ConstString name,
> > > >>>>                    const FormatterSharedPointer &format_sp) -> bool {
> > > >>>>        if (formatter_regex) {
> > > >>>>          bool escape = true;
> > > >>>>          if (name.GetStringRef() == formatter_regex->GetText()) {
> > > >>>>            escape = false;
> > > >>>>          } else if (formatter_regex->Execute(name.GetStringRef())) {
> > > >>>>            escape = false;
> > > >>>>          }
> > > >>>>
> > > >>>> No copy saved. Just wasted time with strlen in StringRef constructor.
> > > >>>>
> > > >>>>  bool ParseCoordinate(const char *id_cstr) {
> > > >>>>    RegularExpression regex;
> > > >>>>    RegularExpression::Match regex_match(3);
> > > >>>>
> > > >>>>    llvm::StringRef id_ref = 
> > > >>>> llvm::StringRef::withNullAsEmpty(id_cstr);
> > > >>>>    bool matched = false;
> > > >>>>    if 
> > > >>>> (regex.Compile(llvm::StringRef("^([0-9]+),([0-9]+),([0-9]+)$")) &&
> > > >>>>        regex.Execute(id_ref, &regex_match))
> > > >>>>      matched = true;
> > > >>>>    else if (regex.Compile(llvm::StringRef("^([0-9]+),([0-9]+)$")) &&
> > > >>>>             regex.Execute(id_ref, &regex_match))
> > > >>>>      matched = true;
> > > >>>>    else if (regex.Compile(llvm::StringRef("^([0-9]+)$")) &&
> > > >>>>             regex.Execute(id_ref, &regex_match))
> > > >>>>
> > > >>>> No copy saved. Just wasted time with strlen in StringRef constructor.
> > > >>>>
> > > >>>> void DWARFCompileUnit::ParseProducerInfo() {
> > > >>>> m_producer_version_major = UINT32_MAX;
> > > >>>> m_producer_version_minor = UINT32_MAX;
> > > >>>> m_producer_version_update = UINT32_MAX;
> > > >>>>
> > > >>>> const DWARFDebugInfoEntry *die = GetCompileUnitDIEPtrOnly();
> > > >>>> if (die) {
> > > >>>>
> > > >>>>  const char *producer_cstr = die->GetAttributeValueAsString(
> > > >>>>      m_dwarf2Data, this, DW_AT_producer, NULL);
> > > >>>>  if (producer_cstr) {
> > > >>>>    RegularExpression llvm_gcc_regex(
> > > >>>>        llvm::StringRef("^4\\.[012]\\.[01] \\(Based on Apple "
> > > >>>>                        "Inc\\. build [0-9]+\\) \\(LLVM build "
> > > >>>>                        "[\\.0-9]+\\)$"));
> > > >>>>    if (llvm_gcc_regex.Execute(llvm::StringRef(producer_cstr))) {
> > > >>>>      m_producer = eProducerLLVMGCC;
> > > >>>>    } else if (strstr(producer_cstr, "clang")) {
> > > >>>>      static RegularExpression g_clang_version_regex(
> > > >>>>          llvm::StringRef("clang-([0-9]+)\\.([0-9]+)\\.([0-9]+)"));
> > > >>>>      RegularExpression::Match regex_match(3);
> > > >>>>      if 
> > > >>>> (g_clang_version_regex.Execute(llvm::StringRef(producer_cstr),
> > > >>>>                                        &regex_match)) {
> > > >>>>
> > > >>>> No copy saved. Just wasted time with strlen in StringRef constructor 
> > > >>>> (2 of them mind you).
> > > >>>>
> > > >>>> void DWARFDebugPubnamesSet::Find(
> > > >>>>  const RegularExpression &regex,
> > > >>>>  std::vector<dw_offset_t> &die_offset_coll) const {
> > > >>>> DescriptorConstIter pos;
> > > >>>> DescriptorConstIter end = m_descriptors.end();
> > > >>>> for (pos = m_descriptors.begin(); pos != end; ++pos) {
> > > >>>>  if (regex.Execute(pos->name.c_str()))
> > > >>>>    die_offset_coll.push_back(m_header.die_offset + pos->offset);
> > > >>>> }
> > > >>>> }
> > > >>>>
> > > >>>> No copy saved. Just wasted time with strlen in StringRef constructor 
> > > >>>> (2 of them mind you).
> > > >>>>
> > > >>>>
> > > >>>>    std::string slice_str;
> > > >>>>    if (reg_info_dict->GetValueForKeyAsString("slice", slice_str, 
> > > >>>> nullptr)) {
> > > >>>>      // Slices use the following format:
> > > >>>>      //  REGNAME[MSBIT:LSBIT]
> > > >>>>      // REGNAME - name of the register to grab a slice of
> > > >>>>      // MSBIT - the most significant bit at which the current 
> > > >>>> register value
> > > >>>>      // starts at
> > > >>>>      // LSBIT - the least significant bit at which the current 
> > > >>>> register value
> > > >>>>      // ends at
> > > >>>>      static RegularExpression g_bitfield_regex(
> > > >>>>          
> > > >>>> llvm::StringRef("([A-Za-z_][A-Za-z0-9_]*)\\[([0-9]+):([0-9]+)\\]"));
> > > >>>>      RegularExpression::Match regex_match(3);
> > > >>>>      if (g_bitfield_regex.Execute(slice_str, &regex_match)) {
> > > >>>>
> > > >>>> No copy saved.
> > > >>>>
> > > >>>> void SourceManager::File::FindLinesMatchingRegex(
> > > >>>>  RegularExpression &regex, uint32_t start_line, uint32_t end_line,
> > > >>>>  std::vector<uint32_t> &match_lines) {
> > > >>>> match_lines.clear();
> > > >>>>
> > > >>>> if (!LineIsValid(start_line) ||
> > > >>>>    (end_line != UINT32_MAX && !LineIsValid(end_line)))
> > > >>>>  return;
> > > >>>> if (start_line > end_line)
> > > >>>>  return;
> > > >>>>
> > > >>>> for (uint32_t line_no = start_line; line_no < end_line; line_no++) {
> > > >>>>  std::string buffer;
> > > >>>>  if (!GetLine(line_no, buffer))
> > > >>>>    break;
> > > >>>>  if (regex.Execute(buffer)) {
> > > >>>>    match_lines.push_back(line_no);
> > > >>>>  }
> > > >>>> }
> > > >>>> }
> > > >>>>
> > > >>>> No copy saved.
> > > >>>>
> > > >>>>
> > > >>>> static dw_offset_t
> > > >>>> FindCallbackString(SymbolFileDWARF *dwarf2Data, DWARFCompileUnit *cu,
> > > >>>>                 DWARFDebugInfoEntry *die, const dw_offset_t 
> > > >>>> next_offset,
> > > >>>>                 const uint32_t curr_depth, void *userData) {
> > > >>>> FindCallbackStringInfo *info = (FindCallbackStringInfo *)userData;
> > > >>>>
> > > >>>> if (!die)
> > > >>>>  return next_offset;
> > > >>>>
> > > >>>> const char *die_name = die->GetName(dwarf2Data, cu);
> > > >>>> if (!die_name)
> > > >>>>  return next_offset;
> > > >>>>
> > > >>>> if (info->regex) {
> > > >>>>  if (info->regex->Execute(llvm::StringRef(die_name)))
> > > >>>>    info->die_offsets.push_back(die->GetOffset());
> > > >>>> } else {
> > > >>>>  if ((info->ignore_case ? strcasecmp(die_name, info->name)
> > > >>>>                         : strcmp(die_name, info->name)) == 0)
> > > >>>>    info->die_offsets.push_back(die->GetOffset());
> > > >>>> }
> > > >>>>
> > > >>>> // Just return the current offset to parse the next CU or DIE entry
> > > >>>> return next_offset;
> > > >>>> }
> > > >>>>
> > > >>>> No copy saved.
> > > >>>>
> > > >>>>
> > > >>>> bool Get_Impl(ConstString key, MapValueType &value,
> > > >>>>              lldb::RegularExpressionSP *dummy) {
> > > >>>>  llvm::StringRef key_str = key.GetStringRef();
> > > >>>>  std::lock_guard<std::recursive_mutex> guard(m_format_map.mutex());
> > > >>>>  MapIterator pos, end = m_format_map.map().end();
> > > >>>>  for (pos = m_format_map.map().begin(); pos != end; pos++) {
> > > >>>>    lldb::RegularExpressionSP regex = pos->first;
> > > >>>>    if (regex->Execute(key_str)) {
> > > >>>>      value = pos->second;
> > > >>>>      return true;
> > > >>>>    }
> > > >>>>  }
> > > >>>>  return false;
> > > >>>> }
> > > >>>>
> > > >>>> No copy saved.
> > > >>>>
> > > >>>>
> > > >>>>        PacketResult echo_packet_result =
> > > >>>>            SendPacketNoLock(llvm::StringRef(echo_packet, 
> > > >>>> echo_packet_len));
> > > >>>>        if (echo_packet_result == PacketResult::Success) {
> > > >>>>          const uint32_t max_retries = 3;
> > > >>>>          uint32_t successful_responses = 0;
> > > >>>>          for (uint32_t i = 0; i < max_retries; ++i) {
> > > >>>>            StringExtractorGDBRemote echo_response;
> > > >>>>            echo_packet_result = 
> > > >>>> WaitForPacketWithTimeoutMicroSecondsNoLock(
> > > >>>>                echo_response, timeout_usec, false);
> > > >>>>            if (echo_packet_result == PacketResult::Success) {
> > > >>>>              ++successful_responses;
> > > >>>>              if 
> > > >>>> (response_regex.Execute(echo_response.GetStringRef())) {
> > > >>>>                sync_success = true;
> > > >>>>
> > > >>>> No copy saved.
> > > >>>>
> > > >>>>
> > > >>>> OptionValueSP Instruction::ReadArray(FILE *in_file, Stream 
> > > >>>> *out_stream,
> > > >>>>                                   OptionValue::Type data_type) {
> > > >>>> bool done = false;
> > > >>>> char buffer[1024];
> > > >>>>
> > > >>>> OptionValueSP option_value_sp(new OptionValueArray(1u << data_type));
> > > >>>>
> > > >>>> int idx = 0;
> > > >>>> while (!done) {
> > > >>>>  if (!fgets(buffer, 1023, in_file)) {
> > > >>>>    out_stream->Printf(
> > > >>>>        "Instruction::ReadArray:  Error reading file (fgets).\n");
> > > >>>>    option_value_sp.reset();
> > > >>>>    return option_value_sp;
> > > >>>>  }
> > > >>>>
> > > >>>>  std::string line(buffer);
> > > >>>>
> > > >>>>  size_t len = line.size();
> > > >>>>  if (line[len - 1] == '\n') {
> > > >>>>    line[len - 1] = '\0';
> > > >>>>    line.resize(len - 1);
> > > >>>>  }
> > > >>>>
> > > >>>>  if ((line.size() == 1) && line[0] == ']') {
> > > >>>>    done = true;
> > > >>>>    line.clear();
> > > >>>>  }
> > > >>>>
> > > >>>>  if (!line.empty()) {
> > > >>>>    std::string value;
> > > >>>>    static RegularExpression g_reg_exp(
> > > >>>>        llvm::StringRef("^[ \t]*([^ \t]+)[ \t]*$"));
> > > >>>>    RegularExpression::Match regex_match(1);
> > > >>>>    bool reg_exp_success = g_reg_exp.Execute(line, &regex_match);
> > > >>>>    if (reg_exp_success)
> > > >>>>      regex_match.GetMatchAtIndex(line.c_str(), 1, value);
> > > >>>>    else
> > > >>>>      value = line;
> > > >>>>
> > > >>>> No copy saved.
> > > >>>>
> > > >>>> I could do on with all 31 call sites, but I will stop. All that was 
> > > >>>> added was inefficiency and requiring us to make a copy of the string 
> > > >>>> before it is used to evaluate. Some of these are evaluated in tight 
> > > >>>> loops, like the searching for type summaries and formats that are 
> > > >>>> regex based. Happens once for each item in a SBValue (and once per 
> > > >>>> child that is displayed...
> > > >>>>
> > > >>>>> We could also just un-delete the overload that takes a const char*, 
> > > >>>>> then the duplication would only ever happen when you explicitly use 
> > > >>>>> a StringRef.
> > > >>>>
> > > >>>> For execute it is fine to make one that takes a StringRef, but it 
> > > >>>> would just call .str().c_str() and call the other one. We would 
> > > >>>> assume the "const char *" version of execute would be null 
> > > >>>> terminated.
> > > >>>>
> > > >>>>>
> > > >>>>> I don't agree this should be reverted.  In the process of doing 
> > > >>>>> this conversion I eliminated numerous careless string copies.
> > > >>>>
> > > >>>> As I showed above the opposite was true. We made many calls to 
> > > >>>> strlen to construct the StringRef so we actually made things slower.
> > > >>>>
> > > >>>> Greg
> > > >>>>
> > > >>>>>
> > > >>>>> On Wed, Sep 21, 2016 at 4:38 PM Greg Clayton <gclay...@apple.com> 
> > > >>>>> wrote:
> > > >>>>> This it the perfect example of why not to use a StringRef since the 
> > > >>>>> string needs to be null terminated. Why did we change this? Now 
> > > >>>>> even if you call this function:
> > > >>>>>
> > > >>>>> RegularExpression r(...);
> > > >>>>>
> > > >>>>> r.Execute(".......................", ...)
> > > >>>>>
> > > >>>>> You will need to duplicate the string on the heap just to execute 
> > > >>>>> this. Please revert this. Anything that requires null terminate is 
> > > >>>>> not a candidate for converting to StringRef.
> > > >>>>>
> > > >>>>>
> > > >>>>>> On Sep 21, 2016, at 10:13 AM, Zachary Turner via lldb-commits 
> > > >>>>>> <lldb-commits@lists.llvm.org> wrote:
> > > >>>>>>
> > > >>>>>> Author: zturner
> > > >>>>>> Date: Wed Sep 21 12:13:51 2016
> > > >>>>>> New Revision: 282090
> > > >>>>>>
> > > >>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=282090&view=rev
> > > >>>>>> Log:
> > > >>>>>> Fix failing regex tests.
> > > >>>>>>
> > > >>>>>> r282079 converted the regular expression interface to accept
> > > >>>>>> and return StringRefs instead of char pointers.  In one case
> > > >>>>>> a null pointer check was converted to an empty string check,
> > > >>>>>> but this was an incorrect conversion because an empty string
> > > >>>>>> is a valid regular expression.  Removing this check should
> > > >>>>>> fix the test failures.
> > > >>>>>>
> > > >>>>>> Modified:
> > > >>>>>> lldb/trunk/source/Core/RegularExpression.cpp
> > > >>>>>>
> > > >>>>>> Modified: lldb/trunk/source/Core/RegularExpression.cpp
> > > >>>>>> URL: 
> > > >>>>>> http://llvm.org/viewvc/llvm-project/lldb/trunk/source/Core/RegularExpression.cpp?rev=282090&r1=282089&r2=282090&view=diff
> > > >>>>>> ==============================================================================
> > > >>>>>> --- lldb/trunk/source/Core/RegularExpression.cpp (original)
> > > >>>>>> +++ lldb/trunk/source/Core/RegularExpression.cpp Wed Sep 21 
> > > >>>>>> 12:13:51 2016
> > > >>>>>> @@ -102,7 +102,7 @@ bool RegularExpression::Compile(llvm::St
> > > >>>>>> //---------------------------------------------------------------------
> > > >>>>>> bool RegularExpression::Execute(llvm::StringRef str, Match *match) 
> > > >>>>>> const {
> > > >>>>>> int err = 1;
> > > >>>>>> -  if (!str.empty() && m_comp_err == 0) {
> > > >>>>>> +  if (m_comp_err == 0) {
> > > >>>>>>  // Argument to regexec must be null-terminated.
> > > >>>>>>  std::string reg_str = str;
> > > >>>>>>  if (match) {
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> _______________________________________________
> > > >>>>>> lldb-commits mailing list
> > > >>>>>> lldb-commits@lists.llvm.org
> > > >>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
> > > >>>>>
> > > >>>>
> > > >>>> _______________________________________________
> > > >>>> lldb-commits mailing list
> > > >>>> lldb-commits@lists.llvm.org
> > > >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
> > > >>>
> > > >>
> > > >
> > > > _______________________________________________
> > > > lldb-commits mailing list
> > > > lldb-commits@lists.llvm.org
> > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
> > >
> >
> 

_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

Reply via email to