Julian Hyde created CALCITE-5764: ------------------------------------ Summary: Puffin, an Awk for Java Key: CALCITE-5764 URL: https://issues.apache.org/jira/browse/CALCITE-5764 Project: Calcite Issue Type: Bug Reporter: Julian Hyde
Create Puffin, which allows a programming model similar to the {{awk}} scripting language. An {{awk}} program is a collection of rules, each of which is a pair: a predicate and an action. For each line in a file, the rules are applied in sequence, and if the predicate evaluates to true, the action is executed. Then {{awk}} goes on to the next file. In {{Puffin}}, each predicate is a {{Predicate<Line>>}}, and each action is a {{Consumer<Line>}}. {{Line}} is a data structure that gives access to the text of the line, regular expression matching, and file-local and global state. File-local state is allocated by a factory, and each file is processed in a single thread. This allows {{Puffin}} to be invoked on multiple files (or more generally sources, including URLs) and processed in parallel. Global state is shared, and rules must coordinate when they access it. Here is a simple {{awk}} script that counts the number of non-comment lines in a file: {code} /^#/ { ++n; } END { printf("counter: %d\n", n); } {code} Here is the equivalent Puffin program: {code} Puffin.Program<Unit> program = Puffin.builder(() -> Unit.INSTANCE, u -> new AtomicInteger()) .add(line -> !line.startsWith("#"), line -> line.state().incrementAndGet()) .after(context -> context.println("counter: " + context.state().get())) .build(); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)