[
https://issues.apache.org/jira/browse/CALCITE-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julian Hyde updated CALCITE-5764:
---------------------------------
Description:
Create Puffin, which allows a programming model similar to the {{awk}}
scripting language.
An {{awk}} program is a collection of rules, each of which is a pair: a
predicate and an action. For each line in a file, the rules are applied in
sequence, and if the predicate evaluates to true, the action is executed. Then
{{awk}} goes on to the next file.
Here is a simple {{awk}} script that counts the number of non-comment lines in
a file:
{code}
/^#/ {
++n;
}
END {
printf("counter: %d\n", n);
}
{code}
Here is the equivalent Puffin program:
{code}
Puffin.Program<Unit> program =
Puffin.builder(() -> Unit.INSTANCE, u -> new AtomicInteger())
.add(line -> !line.startsWith("#"),
line -> line.state().incrementAndGet())
.after(context ->
context.println("counter: " + context.state().get()))
.build();
{code}
In {{Puffin}}, each predicate is a {{Predicate<Line>>}}, and each action is a
{{Consumer<Line>}}. {{Line}} is a data structure that gives access to the text
of the line, regular expression matching, and file-local and global state.
{{Puffin}} allows thread-safe parallel processing of multiple files (or more
generally sources, including URLs). File-local state is allocated by a factory,
and each file is processed in a single thread. Therefore rules do not need to
coordinate with rules processing other files.
Global state is also allocated by a factory, but it is shared, and rules must
coordinate when they access it. In the above example, {{u -> new
AtomicInteger()}} is the factory that creates global state.
was:
Create Puffin, which allows a programming model similar to the {{awk}}
scripting language.
An {{awk}} program is a collection of rules, each of which is a pair: a
predicate and an action. For each line in a file, the rules are applied in
sequence, and if the predicate evaluates to true, the action is executed. Then
{{awk}} goes on to the next file.
In {{Puffin}}, each predicate is a {{Predicate<Line>>}}, and each action is a
{{Consumer<Line>}}. {{Line}} is a data structure that gives access to the text
of the line, regular expression matching, and file-local and global state.
File-local state is allocated by a factory, and each file is processed in a
single thread. This allows {{Puffin}} to be invoked on multiple files (or more
generally sources, including URLs) and processed in parallel. Global state is
shared, and rules must coordinate when they access it.
Here is a simple {{awk}} script that counts the number of non-comment lines in
a file:
{code}
/^#/ { ++n; }
END { printf("counter: %d\n", n); }
{code}
Here is the equivalent Puffin program:
{code}
Puffin.Program<Unit> program =
Puffin.builder(() -> Unit.INSTANCE, u -> new AtomicInteger())
.add(line -> !line.startsWith("#"),
line -> line.state().incrementAndGet())
.after(context ->
context.println("counter: " + context.state().get()))
.build();
{code}
> Puffin, an Awk for Java
> -----------------------
>
> Key: CALCITE-5764
> URL: https://issues.apache.org/jira/browse/CALCITE-5764
> Project: Calcite
> Issue Type: Bug
> Reporter: Julian Hyde
> Priority: Major
>
> Create Puffin, which allows a programming model similar to the {{awk}}
> scripting language.
> An {{awk}} program is a collection of rules, each of which is a pair: a
> predicate and an action. For each line in a file, the rules are applied in
> sequence, and if the predicate evaluates to true, the action is executed.
> Then {{awk}} goes on to the next file.
> Here is a simple {{awk}} script that counts the number of non-comment lines
> in a file:
> {code}
> /^#/ {
> ++n;
> }
> END {
> printf("counter: %d\n", n);
> }
> {code}
> Here is the equivalent Puffin program:
> {code}
> Puffin.Program<Unit> program =
> Puffin.builder(() -> Unit.INSTANCE, u -> new AtomicInteger())
> .add(line -> !line.startsWith("#"),
> line -> line.state().incrementAndGet())
> .after(context ->
> context.println("counter: " + context.state().get()))
> .build();
> {code}
> In {{Puffin}}, each predicate is a {{Predicate<Line>>}}, and each action is a
> {{Consumer<Line>}}. {{Line}} is a data structure that gives access to the
> text of the line, regular expression matching, and file-local and global
> state.
> {{Puffin}} allows thread-safe parallel processing of multiple files (or more
> generally sources, including URLs). File-local state is allocated by a
> factory, and each file is processed in a single thread. Therefore rules do
> not need to coordinate with rules processing other files.
> Global state is also allocated by a factory, but it is shared, and rules must
> coordinate when they access it. In the above example, {{u -> new
> AtomicInteger()}} is the factory that creates global state.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)