Le 17/01/2022 à 13:27, Ruediger Pluem a écrit :


On 1/16/22 10:35 PM, William A Rowe Jr wrote:
4 day ago, you all saw this. 6 years ago, you all started using this on trunk.

Don't know what I can to do help this along and honor the library
author's wishes for
all of us to walk away from the dead fork, and use the maintained
fork. It's whatever
it is, I'm out of here and removing the backport application branch,
whoever 3rd upvotes
this be prepared to apply this for us all, thanks.

With regards to the de-optimization / stack usage probably stupid question:
Can't we use the TLS features that several compilers offer?
e.g. see at the end of the page at 
https://stackoverflow.com/questions/18298280/how-to-declare-a-variable-as-thread-local-portably

If we have no thread_local we could fall back to the current unoptimized 
implementation?

Regards

Rüdiger



What about something as stupid as the attached patch?
(just a POC to give the general idea)

Reserve some space on stack (in the spirit of the optimization for PCRE1) and implement a malloc/free that is clever enough to use this space instead a real malloc if pcre2 does not ask for too much space.

The only difference with what is done with PCRE1 is that in this case we use on the stack the exact space needed for for POSIX_MALLOC_THRESHOLD (10) matches. In the attach patch, we reserve "some space" on the stack and we expect that it is large enough to avoid a malloc.


The space required by pcre2 is:
offsetof(pcre2_match_data, ovector) + 2*oveccount*sizeof(PCRE2_SIZE)

With:
typedef struct pcre2_real_match_data {
  pcre2_memctl     memctl;
  const pcre2_real_code *code;    /* The pattern used for the match */
  PCRE2_SPTR       subject;       /* The subject that was matched */
  PCRE2_SPTR       mark;          /* Pointer to last mark */
  PCRE2_SIZE       leftchar;      /* Offset to leftmost code unit */
  PCRE2_SIZE       rightchar;     /* Offset to rightmost code unit */
  PCRE2_SIZE       startchar;     /* Offset to starting code unit */
  uint8_t          matchedby;     /* Type of match (normal, JIT, DFA) */
  uint8_t          flags;         /* Various flags */
  uint16_t         oveccount;     /* Number of pairs */
  int              rc;            /* The return code from the match */
  PCRE2_SIZE       ovector[131072]; /* Must be last in the structure */
} pcre2_real_match_data;

(my understanding is that pcre2_real_match_data and pcre2_match_data are the same)



So we should be able to compute a MAX_PCRE2_STACK_DATA (see patch) that is both large enough to store 10 or so matches, and not too big.

Not sure it is a clean solution, but I guess that it could work.


BTW, does someone have some numbers for the cost of the malloc we try to avoid? (read: are we over engineering the solution?)

CJ
Index: util_pcre.c
===================================================================
--- util_pcre.c	(révision 1896817)
+++ util_pcre.c	(copie de travail)
@@ -272,6 +272,31 @@
                           eflags);
 }
 
+//
+// Code called somewhere at server start to "register" our malloc/free functions
+// For PCRE2 memory allocation optimization
+//
+pcre2_general_context *global_pcre2_context =
+        pcre2_general_context_create(&pcre2_malloc, &pcre2_free, NULL)
+//
+//
+
+#define MAX_PCRE2_STACK_DATA    1024
+#define PCRE2_ON_STACK          ((void *)(-1))
+
+void *pcre2_malloc(size_t size, void *data)
+{
+    if (size <= MAX_PCRE2_STACK_DATA)
+        return PCRE2_ON_STACK;
+    
+    return malloc(size);
+}
+
+void pcre2_free(void *mem, void *data)
+{
+    return free(mem);
+}
+
 AP_DECLARE(int) ap_regexec_len(const ap_regex_t *preg, const char *buff,
                                apr_size_t len, apr_size_t nmatch,
                                ap_regmatch_t *pmatch, int eflags)
@@ -282,11 +307,12 @@
 #ifdef HAVE_PCRE2
     pcre2_match_data *matchdata;
     size_t *ovector;
+    char small_ovector[MAX_PCRE2_STACK_DATA];
 #else
     int small_ovector[POSIX_MALLOC_THRESHOLD * 3];
-    int allocated_ovector = 0;
     int *ovector = NULL;
 #endif
+    int allocated_ovector = 0;
 
     if ((eflags & AP_REG_NOTBOL) != 0)
         options |= PCREn(NOTBOL);
@@ -304,9 +330,17 @@
      */
     nlim = ((apr_size_t)preg->re_nsub + 1) > nmatch
          ? ((apr_size_t)preg->re_nsub + 1) : nmatch;
-    matchdata = pcre2_match_data_create(nlim, NULL);
+
+    allocated_ovector = 1;
+    matchdata = pcre2_match_data_create(nlim, global_pcre2_context);
     if (matchdata == NULL)
         return AP_REG_ESPACE;
+    if (matchdata == PCRE2_ON_STACK)
+    {
+        allocated_ovector = 0;
+        matchdata = small_ovector;
+    }
+        
     ovector = pcre2_get_ovector_pointer(matchdata);
     rc = pcre2_match((const pcre2_code *)preg->re_pcre,
                      (const unsigned char *)buff, len,
@@ -343,7 +377,8 @@
     }
 
 #ifdef HAVE_PCRE2
-    pcre2_match_data_free(matchdata);
+    if (allocated_ovector)
+        pcre2_match_data_free(matchdata);
 #else
     if (allocated_ovector)
         free(ovector);

Reply via email to