[bval-site] 05/09: Fixes. Need asfgenid.py plugin for permalinks

wave Tue, 01 Jun 2021 13:01:08 -0700

This is an automated email from the ASF dual-hosted git repository.

wave pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/bval-site.git


commit a85b13e9b1f89f940cc00f16da7f36ed97cf03e4
Author: Dave Fisher <[email protected]>
AuthorDate: Tue Jun 1 11:41:13 2021 -0700

    Fixes. Need asfgenid.py plugin for permalinks
---
 content/building.md       |  14 +-
 content/downloads.md      |  69 ++++-----
 content/samples.md        |   2 +-
 pelicanconf.py            |  24 +--
 theme/plugins/README.md   |   3 -
 theme/plugins/asfgenid.py | 384 ++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 438 insertions(+), 58 deletions(-)

diff --git a/content/building.md b/content/building.md
index 7cdcfe1..56450d5 100644
--- a/content/building.md
+++ b/content/building.md
@@ -48,15 +48,15 @@ command. You may also want to add a maven central mirror 
repository to your
 1. Checkout the source as described above
 1. Build the source using Maven as described above
 
-```sh
-    mvn eclipse:eclipse
-```
+   ```sh
+   mvn eclipse:eclipse
+   ```
 
-    If this is the first project in your workspace to use maven artifacts you 
need to create a classpath variable named M2_REPO which contains the full path 
to your local repository. The eclipse plugin can do this for you with the 
following command:
+   If this is the first project in your workspace to use maven artifacts you 
need to create a classpath variable named M2_REPO which contains the full path 
to your local repository. The eclipse plugin can do this for you with the 
following command:
 
-```sh
-    mvn eclipse:configure-workspace -Declipse.workspace=$  
{path_to_your_workspace}
-```
+   ```sh
+   mvn eclipse:configure-workspace -Declipse.workspace=$  
{path_to_your_workspace}
+   ```
 
 1. Start Eclipse (3.4 or later suggested) and create a new workspace
 1. Import the BVal project, by:
diff --git a/content/downloads.md b/content/downloads.md
index 6153634..25c649a 100644
--- a/content/downloads.md
+++ b/content/downloads.md
@@ -18,7 +18,7 @@ how to verify the integrity of downloaded files.
 
 #### Apache BVal 2.0.5 - Java 8 - Bean Validation v2.0 - Released October 26 
2020
 Module | Artifact | Signatures | Comments
-- | - | - | -
+--|--|--|--
 Source Distribution | [bval-parent-2.0.5-source-release.zip][src202] | 
[asc][src-asc202] [sha512][src-sha512202] | -
 JSR380 Implementation | [bval-jsr-2.0.5.jar][jsr202] | [asc][jsr-asc202] 
[md5][jsr-md5202] [sha1][jsr-sha1202] | 
`javax.validation.spi.ValidationProvider`
 Implementation Bundle | [org.apache.bval.bundle-2.0.5.jar][bundle202] | 
[asc][bundle-asc202] [md5][bundle-md5202] [sha1][bundle-sha1202] | 
`javax.validation.spi.ValidationProvider` w/ OSGi metadata (includes `bval-jsr`)
@@ -43,7 +43,7 @@ Extra Routines and Constraints | 
[bval-extras-2.0.5.jar][bvextras202] | [asc][bv
 #### Apache BVal 1.1.2 - Java 6 - Bean Validation v1.1 - Released Nov 3 2016
 
 Module | Artifact | Signatures | Comments
-- | - | - | -
+--|--|--|--
 Source Distribution | [bval-parent-1.1.2-source-release.zip][src112] | 
[asc][src-asc112] [md5][src-md5112] [sha1][src-sha1112] | -
 Core Framework | [bval-core-1.1.2.jar][core112] | [asc][core-asc112] 
[md5][core-md5112] [sha1][core-sha1112] | -
 JSR349 Implementation | [bval-jsr-1.1.2.jar][jsr112] | [asc][jsr-asc112] 
[md5][jsr-md5112] [sha1][jsr-sha1112] | 
`javax.validation.spi.ValidationProvider` (requires `bval-core`)
@@ -86,7 +86,7 @@ Note: this release depends on geronimo-validation_1.1_spec 
API jar or any offici
 #### Apache BVal 0.5 - Java 5 - Bean Validation v1.0 - Released September 21, 
2012
 
 Module | Artifact | Signatures | Comments
-- | - | - | -
+--|--|--|--
 Source Distribution | [bval-parent-0.5-source-release.zip][src] | 
[asc][src-asc] [md5][src-md5] [sha1][src-sha1] | -
 Core Framework | [bval-core-0.5.jar][core] | [asc][core-asc] [md5][core-md5] 
[sha1][core-sha1] | -
 JSR303 Implementation | [bval-jsr303-0.5.jar][jsr303] | [asc][jsr303-asc] 
[md5][jsr303-md5] [sha1][jsr303-sha1] | 
`javax.validation.spi.ValidationProvider` (requires `bval-core`)
@@ -160,17 +160,16 @@ You'll need to add the following dependencies in your 
builds (and Maven
 will automatically include the additional transitive dependencies for you):
 
 ```html
-
-    <dependency>
-      <groupId>org.apache.geronimo.specs</groupId>
-      <artifactId>geronimo-validation_1.0_spec</artifactId>
-      <version>1.1</version>
-    </dependency>
-    <dependency>
-      <groupId>org.apache.bval</groupId>
-      <artifactId>org.apache.bval.bundle</artifactId>
-      <version>0.5</version>
-    </dependency>
+<dependency>
+  <groupId>org.apache.geronimo.specs</groupId>
+  <artifactId>geronimo-validation_1.0_spec</artifactId>
+  <version>1.1</version>
+</dependency>
+<dependency>
+  <groupId>org.apache.bval</groupId>
+  <artifactId>org.apache.bval.bundle</artifactId>
+  <version>0.5</version>
+</dependency>
 ```
 
 Maven will determine the transitive dependencies for the artifacts, but if
@@ -178,21 +177,21 @@ you are not using Maven to build your project, then you 
will also need the
 following dependencies on the classpath:
 
 ```html    
-    <dependency>
-       <groupId>org.apache.commons</groupId>
-       <artifactId>commons-lang3</artifactId>
-       <version>3.1</version>
-    </dependency>
-    <dependency>
-       <groupId>org.slf4j</groupId>
-       <artifactId>slf4j-simple</artifactId>
-       <version>1.6.1</version>
-    </dependency>
-    <dependency>
-       <groupId>commons-beanutils</groupId>
-       <artifactId>commons-beanutils</artifactId>
-       <version>1.8.3</version>
-    </dependency>
+<dependency>
+  <groupId>org.apache.commons</groupId>
+  <artifactId>commons-lang3</artifactId>
+  <version>3.1</version>
+</dependency>
+<dependency>
+  <groupId>org.slf4j</groupId>
+  <artifactId>slf4j-simple</artifactId>
+  <version>1.6.1</version>
+</dependency>
+<dependency>
+  <groupId>commons-beanutils</groupId>
+  <artifactId>commons-beanutils</artifactId>
+  <version>1.8.3</version>
+</dependency>
 ```
 
 <a name="Downloads-VerifyingReleases"></a>
@@ -212,22 +211,22 @@ rather than from a
 Then verify the signatures using:
 
 ```sh
-    $ pgpk -a KEYS
-    $ pgpv bval-parent-0.5-source-release.zip.asc
+$ pgpk -a KEYS
+$ pgpv bval-parent-0.5-source-release.zip.asc
 ```
 
 or
 
 ```sh
-    $ pgp -ka KEYS
-    $ pgp bval-parent-0.5-source-release.zip.asc
+$ pgp -ka KEYS
+$ pgp bval-parent-0.5-source-release.zip.asc
 ```
 
 or
 
 ```sh
-    $ gpg --import KEYS
-    $ gpg --verify bval-parent-0.5-source-release.zip.asc
+$ gpg --import KEYS
+$ gpg --verify bval-parent-0.5-source-release.zip.asc
 ```
 
 Alternatively, you can verify the MD5 signature on the files. A Unix/Linux
diff --git a/content/samples.md b/content/samples.md
index 1a28c93..2d80181 100644
--- a/content/samples.md
+++ b/content/samples.md
@@ -1,6 +1,6 @@
 Title: Samples
 
-other projects at Apache, and other external projects that use a ASL 2.0
+Here we have collected some Bean Validation samples from our project, other 
projects at Apache, and other external projects that use a ASL 2.0
 friendly license. Enjoy!
 
 <a name="Samples-bval-samples"></a>
diff --git a/pelicanconf.py b/pelicanconf.py
index 1754726..c5d8360 100644
--- a/pelicanconf.py
+++ b/pelicanconf.py
@@ -100,7 +100,7 @@ THEME = './theme/apache'
 PLUGIN_PATHS = ['./theme/plugins']
 # PLUGINS = ['asfgenid', 'asfdata', 'pelican-gfm', 'asfreader']
 # We are using the default plugin - 'pelican-gfm' which is installed by the 
build
-PLUGINS = ['pelican-gfm']
+PLUGINS = ['asfgenid', 'pelican-gfm']
 
 # Lifecycle and plugins:
 # (1) Initialization:
@@ -124,17 +124,17 @@ PLUGINS = ['pelican-gfm']
 # }
 
 # Configure the asfgenid plugin
-# ASF_GENID = {
-#    'metadata': True,
-#    'elements': True,
-#    'headings': True,
-#    'headings_re': r'^h[1-4]',
-#    'permalinks': True,
-#    'toc': True,
-#    'toc_headers': r"h[1-4]",
-#    'tables': True,
-#    'debug': False
-# }
+ASF_GENID = {
+    'metadata': False,
+    'elements': False,
+    'headings': True,
+    'headings_re': r'^h[1-4]',
+    'permalinks': True,
+    'toc': False,
+    'toc_headers': r"h[1-4]",
+    'tables': True,
+    'debug': False
+}
 
 # Sitemap Generator
 # SITEMAP = {
diff --git a/theme/plugins/README.md b/theme/plugins/README.md
deleted file mode 100644
index cf8c404..0000000
--- a/theme/plugins/README.md
+++ /dev/null
@@ -1,3 +0,0 @@
-# Placeholder
-
-Need to have this directory to receive the standard plugins
diff --git a/theme/plugins/asfgenid.py b/theme/plugins/asfgenid.py
new file mode 100644
index 0000000..f0ff875
--- /dev/null
+++ b/theme/plugins/asfgenid.py
@@ -0,0 +1,384 @@
+'''
+asfgenid
+===================================
+Generates HeadingIDs, ElementID, and PermaLinks
+First find all specified IDs and classes. Assure unique ID and permalink
+Next find all headings missing IDs. Assure unique ID and permalink
+Generates a Table of Content
+'''
+
+# from __future__ import unicode_literals
+
+import sys
+import traceback
+import re
+import unicodedata
+
+from bs4 import BeautifulSoup, Comment
+
+import pelican.contents
+import pelican.plugins.signals
+
+'''
+Based on
+https://github.com/waylan/Python-Markdown/blob/master/markdown/extensions/headerid.py
+Which is BSD licensed, but is very much rewritten.
+'''
+
+ASF_GENID = {
+    'metadata': True,          # {{ metadata }} inclusion of data in the html.
+    'elements': True,         # {#id} and {.class} annotations.
+    'headings': True,         # add slugified id to headings missing id. Can 
be overridden by page metadata.
+    'headings_re': r'^h[1-6]', # regex for which headings to check.
+    'permalinks': True,               # add permalinks to elements and 
headings when id is added.
+    'toc': True,              # check for [TOC] and add Table of Content if 
present.
+    'toc_headers': r'h[1-6]',  # regex for which headings to include in the 
[TOC]
+    'tables': True,           # add class="table" for tables missing class.
+    'debug': False
+}
+
+# Fixup tuples for HTML that GFM makes into text.
+FIXUP_UNSAFE = [
+    (re.compile(r'&lt;script'),'<script'),
+    (re.compile(r'&lt;/script'),'</script'),
+    (re.compile(r'&lt;style'),'<style'),
+    (re.compile(r'&lt;/style'),'</style'),
+    (re.compile(r'&lt;iframe'),'<iframe'),
+    (re.compile(r'&lt;/iframe'),'</iframe')
+]
+
+# Find {{ metadata }} inclusions
+METADATA_RE = re.compile(r'{{\s*(?P<meta>[-_:a-zA-Z0-9]+)\s*}}')
+
+# Find {#id} or {.class} elementid annotations
+ELEMENTID_RE = re.compile(r'(?:[ \t]*[{\[][ 
\t]*(?P<type>[#.])(?P<id>[-._:a-zA-Z0-9 ]+)[}\]])(\n|$)')
+
+# ID duplicates match
+IDCOUNT_RE = re.compile(r'^(.*)_([0-9]+)$')
+
+# For permalinks
+LINK_CHAR = '¶'
+
+# strip permalink chars from headings for ToC
+PARA_MAP = {
+    ord(LINK_CHAR): None
+}
+
+# Find table tags - to check for ones without class attribute.
+TABLE_RE = re.compile(r'^table')
+
+# An item in a Table of Contents - from toc.py
+class HtmlTreeNode(object):
+    def __init__(self, parent, header, level, id):
+        self.children = []
+        self.parent = parent
+        self.header = header
+        self.level = level
+        self.id = id
+
+    def add(self, new_header):
+        new_level = new_header.name
+        new_string = new_header.string
+        new_id = new_header.attrs.get('id')
+
+        if not new_string:
+            new_string = new_header.find_all(
+                text=lambda t: not isinstance(t, Comment),
+                recursive=True)
+            new_string = ''.join(new_string)
+        new_string = new_string.translate(PARA_MAP)
+
+        if self.level < new_level:
+            new_node = HtmlTreeNode(self, new_string, new_level, new_id)
+            self.children += [new_node]
+            return new_node, new_header
+        elif self.level == new_level:
+            new_node = HtmlTreeNode(self.parent, new_string, new_level, new_id)
+            self.parent.children += [new_node]
+            return new_node, new_header
+        elif self.level > new_level:
+            return self.parent.add(new_header)
+
+    def __str__(self):
+        ret = ''
+        if self.parent:
+            ret = "<a class='toc-href' href='#{0}' title='{1}'>{1}</a>".format(
+                self.id, self.header)
+
+        if self.children:
+            ret += "<ul>{}</ul>".format('{}' * len(self.children)).format(
+                *self.children)
+
+        if self.parent:
+            ret = "<li>{}</li>".format(ret)
+
+        if not self.parent:
+            ret = "<div id='toc'>{}</div>".format(ret)
+
+        return ret
+
+
+# assure configuration
+def init_default_config(pelican):
+    from pelican.settings import DEFAULT_CONFIG
+
+    DEFAULT_CONFIG.setdefault('ASF_GENID', ASF_GENID)
+    if(pelican):
+        pelican.settings.setdefault('ASF_GENID', ASF_GENID)
+
+
+# from Apache CMS markdown/extensions/headerid.py - slugify in the same way as 
the Apache CMS
+def slugify(value, separator):
+    """ Slugify a string, to make it URL friendly. """
+    value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore')
+    value = re.sub('[^\\w\\s-]', '', value.decode('ascii')).strip().lower()
+    return re.sub('[%s\\s]+' % separator, separator, value)
+
+
+# Ensure an id is unique in a set of ids. Append '_1', '_2'... if not
+def unique(id, ids):
+    while id in ids or not id:
+        m = IDCOUNT_RE.match(id)
+        print(f'id="{id}" is a duplicate')
+        if m:
+            id = '%s_%d' % (m.group(1), int(m.group(2)) + 1)
+        else:
+            id = '%s_%d' % (id, 1)
+    ids.add(id)
+    return id
+
+
+# append a permalink
+def permalink(soup, mod_element):
+    new_tag = soup.new_tag('a', href='#' + mod_element['id'])
+    new_tag['class'] = 'headerlink'
+    new_tag['title'] = 'Permalink'
+    new_tag.string = LINK_CHAR
+    mod_element.append(new_tag)
+
+
+# fixup cmark content - note that this may be too hungry. It may need to occur 
later and skipped in codeblock and pre tags.
+def fixup_content(content):
+    text = content._content
+    modified = False
+    # Find messed up html
+    for regex, replace in FIXUP_UNSAFE:
+        m = regex.search(text)
+        if m:
+            modified = True
+            text = re.sub(regex, replace, text)
+    if modified:
+        content._content = text
+
+
+# expand metadata found in {{ key }}
+def expand_metadata(tag, metadata):
+    this_string = str(tag.string)
+    m = 1
+    modified = False
+    while m:
+        m = METADATA_RE.search(this_string)
+        if m:
+            this_data = m.group(1).strip()
+            format_string = '{{{0}}}'.format(this_data)
+            try:
+                new_string = format_string.format(**metadata)
+                print(f'{{{{{m.group(1)}}}}} -> {new_string}')
+            except Exception:
+                # the data expression was not found
+                print(f'{{{{{m.group(1)}}}}} is not found')
+                new_string = format_string
+            # replace the first pattern with the new_string
+            this_string = re.sub(METADATA_RE, new_string, this_string, count=1)
+            modified = True
+    if modified:
+        tag.string.replace_with(this_string)
+
+
+# do elementid transformation for {#id} and {.class} attribute annotations.
+def elementid_transform(ids, soup, tag, permalinks, perma_set, debug):
+    tagnav = tag.parent
+    this_string = str(tag.string)
+    if debug:
+        print(f'name = {tagnav.name}, string = {this_string}')
+    if tagnav.name not in ['[document]', 'code', 'pre']:
+        m = ELEMENTID_RE.search(tag.string)
+        if m:
+            # this replacement could be better it truncates and likely drops 
additional annotations
+            tag.string.replace_with(this_string[:m.start()])
+            if m.group('type') == '#':
+                # id attribute annotation
+                tagnav['id'] = unique(m.group('id'), ids)
+                if permalinks:
+                    permalink(soup, tagnav)
+                    unique(tagnav['id'], perma_set)
+                if debug:
+                    print(f'# insertion {tagnav}')
+            else:
+                # class attribute annotation (regex only recognizes the two 
types)
+                tagnav['class'] = m.group('id')
+                if debug:
+                    print(f'Class {tag.name} : {tagnav["class"]}')
+
+
+# generate id for a heading
+def headingid_transform(ids, soup, tag, permalinks, perma_set):
+    new_string = tag.string
+    if not new_string:
+        # roll up strings if no immediate string
+        new_string = tag.find_all(
+            text=lambda t: not isinstance(t, Comment),
+            recursive=True)
+        new_string = ''.join(new_string)
+
+    # don't have an id create it from text
+    new_id = slugify(new_string, '-')
+    tag['id'] = unique(new_id, ids)
+    if permalinks:
+        permalink(soup, tag)
+        # inform if there is a duplicate permalink
+        unique(tag['id'], perma_set)
+
+
+# generate table of contents from headings after [TOC] content
+def generate_toc(content, tags, title, toc_headers):
+    settoc = False
+    tree = node = HtmlTreeNode(None, title, 'h0', '')
+    # find the last [TOC]
+    taglast = tags[0]
+    for tag in tags:
+        taglast = tag
+    # find all headings after the final [TOC]
+    heading_re = re.compile(toc_headers)
+    for header in taglast.findAllNext(heading_re):
+        # we have heading content for the ToC
+        settoc = True
+        # add the heading.
+        node, _new_header = node.add(header)
+    # convert the ToC to Beautiful Soup
+    tree_soup = ''
+    if settoc:
+        print('  ToC')
+        # convert the HtmlTreeNode into Beautiful Soup
+        tree_string = '{}'.format(tree)
+        tree_soup = BeautifulSoup(tree_string, 'html.parser')
+        # Make the ToC availble to the theme's template
+        content.toc = tree_soup.decode(formatter='html')
+    # replace the first [TOC] with the generated table of contents
+    for tag in tags:
+        tag.replaceWith(tree_soup)
+        # replace additional [TOC] with nothing
+        tree_soup = ''
+
+
+# add the asfdata metadata into GFM content.
+def add_data(content):
+    """ Mix in ASF data as metadata """
+
+    # if the reader is 'asf' then the asf metadata is already in place during 
asfreader plugin.
+    if content.metadata.get('reader') != 'asf':
+        asf_metadata = content.settings.get('ASF_DATA', { }).get('metadata')
+        if asf_metadata:
+            content.metadata.update(asf_metadata)
+
+
+# main worker transforming the html
+def generate_id(content):
+    if isinstance(content, pelican.contents.Static):
+        return
+
+    # track the id tags
+    ids = set()
+    # track permalinks
+    permalinks = set()
+    
+    # step 1 - fixup html that cmark marks unsafe - move to later?
+    fixup_content(content)
+
+    # step 2 - prepare for genid processes
+    # parse html content into BeautifulSoup4
+    soup = BeautifulSoup(content._content, 'html.parser')
+    # page title
+    title = content.metadata.get('title', 'Title')
+    # assure relative source path is in the metadata
+    content.metadata['relative_source_path'] = content.relative_source_path
+    # display output path and title
+    print(f'{content.relative_source_path} - {title}')
+    # enhance metadata if done by asfreader
+    add_data(content)
+    # get plugin settings
+    asf_genid = content.settings['ASF_GENID']
+    # asf_headings setting may be overridden
+    asf_headings = content.metadata.get('asf_headings', 
str(asf_genid['headings']))
+    # show active plugins
+    if asf_genid['debug']:
+        print('asfgenid:\nshow plugins in case one is processing before this 
one')
+        for name in content.settings['PLUGINS']:
+            print(f'plugin: {name}')
+
+    # step 3 - metadata expansion
+    if asf_genid['metadata']:
+        if asf_genid['debug']:
+            print(f'metadata expansion: {content.relative_source_path}')
+        for tag in soup.findAll(string=METADATA_RE):
+            expand_metadata(tag, content.metadata)
+
+    # step 4 - find all id attributes already present
+    for tag in soup.findAll(id=True):
+        unique(tag['id'], ids)
+        # don't change existing ids
+
+    # step 5 - find all {#id} and {.class} text and assign attributes
+    if asf_genid['elements']:
+        if asf_genid['debug']:
+            print(f'elementid: {content.relative_source_path}')
+        for tag in soup.findAll(string=ELEMENTID_RE):
+            elementid_transform(ids, soup, tag, asf_genid['permalinks'], 
permalinks, asf_genid['debug'])
+
+    # step 6 - find all headings w/o ids already present or assigned with 
{#id} text
+    if asf_headings == 'True':
+        if asf_genid['debug']:
+            print(f'headings: {content.relative_source_path}')
+        # Find heading tags
+        HEADING_RE = re.compile(asf_genid['headings_re'])
+        for tag in soup.findAll(HEADING_RE, id=False):
+            headingid_transform(ids, soup, tag, asf_genid['permalinks'], 
permalinks)
+
+    # step 7 - find all tables without class
+    if asf_genid['tables']:
+        if asf_genid['debug']:
+            print(f'tables: {content.relative_source_path}')
+        for tag in soup.findAll(TABLE_RE, _class=False):
+            tag['class'] = 'table'
+
+    # step 8 - find TOC tag and generate Table of Contents
+    if asf_genid['toc']:
+        tags = soup('p', text='[TOC]')
+        if tags:
+            generate_toc(content, tags, title, asf_genid['toc_headers'])
+
+    # step 9 - reset the html content
+    content._content = soup.decode(formatter='html')
+
+    # step 10 - output all of the permalinks created
+    for tag in permalinks:
+        print(f'    #{tag}')
+
+
+def tb_connect(pel_ob):
+    """Print any exception, before Pelican chews it into nothingness."""
+    try:
+        generate_id(pel_ob)
+    except:
+        print('-----', file=sys.stderr)
+        print('FATAL: %s' % (pel_ob.relative_source_path), file=sys.stderr)
+        traceback.print_exc()
+        # if we have errors in this module then we want to quit to avoid 
erasing the site
+        sys.exit(4)
+
+
+def register():
+    pelican.plugins.signals.initialized.connect(init_default_config)
+
+
+pelican.plugins.signals.content_object_init.connect(tb_connect)

[bval-site] 05/09: Fixes. Need asfgenid.py plugin for permalinks

Reply via email to