[GitHub] [tvm] comaniac commented on a change in pull request #7304: [TVMC] Add custom codegen (BYOC) passes for compilation and tuning

GitBox Mon, 08 Feb 2021 10:11:16 -0800


comaniac commented on a change in pull request #7304:
URL: https://github.com/apache/tvm/pull/7304#discussion_r572261529




##########
File path: python/tvm/driver/tvmc/common.py
##########
@@ -76,6 +98,165 @@ def convert_graph_layout(mod, desired_layout):
             )
 
 
+def validate_targets(parse_targets):
+    """
+    Apply a series of validations in the targets provided via CLI.
+    """
+    targets = [t["kind"] for t in parse_targets]
+
+    if len(targets) > len(set(targets)):
+        raise TVMCException("Duplicate target definitions are not allowed")
+
+    if targets[-1] not in TVM_TARGETS:
+        raise TVMCException(f"The last target needs to be a TVM target. 
Choices: {TVM_TARGETS}")
+
+    tvm_targets = [t for t in targets if t in TVM_TARGETS]
+    tvm_targets_count = len(tvm_targets)
+    if tvm_targets_count > 1:
+        verbose_tvm_targets = ", ".join(tvm_targets)
+        raise TVMCException(
+            f"Only one of the following targets can be used at a time. "
+            "Found {tvm_targets_count}: {verbose_tvm_targets}."
+        )
+
+
+def tokenize_target(target):
+    """
+    Extract a list of tokens from a target specification text.
+
+    It covers some corner-cases that are not covered by the built-in
+    module 'shlex', such as the use of "+" as a punctuation character.
+
+
+    Example
+    -------
+
+    For the input `foo -op1=v1 -op2="v ,2", bar -op3=v-4` we
+    should obtain:
+
+        ["foo", "-op1=v1", "-op2="v ,2"", ",", "bar", "-op3=v-4"]
+
+    Parameters
+    ----------
+    target : str
+        Target options sent via CLI arguments
+
+    Returns
+    -------
+    list of str
+        a list of parsed tokens extracted from the target string
+    """
+
+    target_pattern = (
+        r"(\-{0,2}[\w\-]+\=?"
+        
r"(?:[\w\+\-]+(?:,[\w\+\-])*|[\'][\w\+\-,\s]+[\']|[\"][\w\+\-,\s]+[\"])*|,)"
+    )
+
+    return re.findall(target_pattern, target)
+
+
+def parse_target(target):
+    """
+    Parse a plain string of targets provided via a command-line
+    argument.
+
+    To send more than one codegen, a comma-separated list
+    is expected. Options start with -<option_name>=<value>.
+
+    We use python standard library 'shlex' to parse the argument in
+    a POSIX compatible way, so that if options are defined as
+    strings with spaces or commas, for example, this is considered
+    and parsed accordingly.
+
+
+    Example
+    -------
+
+    For the input `--target="foo -op1=v1 -op2="v ,2", bar -op3=v-4"` we
+    should obtain:
+
+      [
+        {
+            kind: "foo",
+            opts: {"op1":"v1", "op2":"v ,2"},
+            raw: 'foo -op1=v1 -op2="v ,2"'
+        },
+        {
+            kind: "bar",
+            opts: {"op3":"v-4"},
+            raw: 'bar -op3=v-4'
+        }
+      ]
+
+    Parameters
+    ----------
+    target : str
+        Target options sent via CLI arguments
+
+    Returns
+    -------
+    codegens : list of dict
+        This list preserves the order in which codegens were
+        provided via command line. Each Dict contains three keys:
+        'kind', containing the name of the codegen; 'opts' containing
+        a key-value for all options passed via CLI; 'raw',
+        containing the plain string for this codegen
+    """
+    codegens = []
+
+    parsed_tokens = tokenize_target(target)
+
+    split_codegens = []
+    current_codegen = []
+    split_codegens.append(current_codegen)
+    for token in parsed_tokens:
+        # every time there is a comma separating
+        # two codegen definitions, prepare for
+        # a new codegen
+        if token == ",":
+            current_codegen = []
+            split_codegens.append(current_codegen)
+        else:
+            # collect a new token for the current
+            # codegen being parsed
+            current_codegen.append(token)
+
+    # at this point we have a list of lists,
+    # each item on the first list is a codegen definition
+    # in the comma-separated values
+    for codegen_def in split_codegens:
+        # the first is expected to be the name
+        name = codegen_def[0]
+        raw_target = " ".join(codegen_def)
+        all_opts = codegen_def[1:] if len(codegen_def) > 1 else []
+        opts = {}
+        for opt in all_opts:
+            try:
+                # deal with -- prefixed flags
+                if opt.startswith("--"):
+                    opt_name = opt[2:]
+                    opt_value = True
+                else:
+                    opt = opt[1:] if opt.startswith("-") else opt
+                    opt_name, opt_value = opt.split("=", maxsplit=1)
+            except ValueError:
+                raise ValueError(f"Error when parsing '{opt}'")
+
+            opts[opt_name] = opt_value
+
+        codegens.append({"kind": name, "opts": opts, "raw": raw_target})
+
+    return codegens
+
+
+def is_inline_json(target):
+    try:
+        json.loads(target)
+        return True
+    except json.decoder.JSONDecodeError:
+        return False

Review comment:
       My concern is more about the code organization and maintenance. The 
usage of `is_inline_json` is not limited to TVMC, so if we want to expose such 
a general function, somewhere like `tvm/utils.py` would be more proper. 
However, since this function is only used by one function in the same module, 
making it as a sub-function could eliminate such concerns.
   
   If you still insist to put this function apart, please add the docstring.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] comaniac commented on a change in pull request #7304: [TVMC] Add custom codegen (BYOC) passes for compilation and tuning

Reply via email to