Re: [PATCH v2 2/4] cli: Extend the search command for --output=addresses and similar

Michal Sojka Thu, 09 Oct 2014 03:56:55 -0700

On Mon, Oct 06 2014, Tomi Ollila wrote:
> On Sun, Oct 05 2014, Michal Sojka <[email protected]> wrote:
>
>> The new outputs allow printing senders, recipients or both of matching
>> messages.
>>
>> This code based on a patch from Jani Nikula.
>
> OK, IMO...
>
> 1/4 OK
>
> Before 2/4 add support for 'flag' arguments, drop the --output=addresses
> option which is now done as --output=sender --output=recipients


OK

> In deduplication comment did not describe the deduplication at all...
> so I looked a bit into the code now... the Default you described was
> that with "John Doe" <[email protected]> and "John Doe" 
> <[email protected]>
> only one was printed (but not which one).

I intentionally didn't want to define which one, but I agree that it
might be useful in same cases. It would depend on --sort option and on
the order of addresses in email headers.

> Secondly, what happens with "Doe, John" <[email protected]> and
> "John Doe" <[email protected]>... ah, it is same as *addr* with
> case-insensitive address.
>
> Sorry, but IMO these options are a bit strange.

My impression is that I did bad job describing the deduplication
algorithm, which is why you don't understand it. Maybe, we can also
change the name of the option to --filter-by, or something like this.

When thinking about how to best document such an option, it seems that
the user must be aware that this is implemented as flags that are ORed.
Which means that the default should be what was in the previous patch
--unique=none.

What about the following?

    ``--filter-flag=``\ (**addr**\ \|\ **name**\ \|\ **addrfold**)

        Can be used with ``--output=addresses``, ``--output=sender`` or
        ``--output=recipients`` to filter out duplicate addresses. The
        filtering algorithm receives a sequence of email addresses and
        outputs the same sequence without the addresses that are
        considered a duplicate of a previously output address. What is
        considered a duplicate depends on the flags given:

        **addr** means that the address part is compared.
        Case-sensitivity can be controlled by **addrfold** flag (see
        below). For example, the addresses "John Doe <[email protected]>"
        and "Dr. John Doe <[email protected]>" will be considered
        duplicate.

        **name** means that the name part is compared in case-sensitive
        manner. For example, the addresses "John Doe <[email protected]>"
        and "John Doe <[email protected]>" will be considered duplicate.

        **addrfold** when used with the **addr** flag, the address
        comparison is performed in case-insensitive manner. For example,
        the addresses "John Doe <[email protected]>" and "Dr. John Doe
        <[email protected]>" will be considered duplicate.

        To specify multiple flags, this option can be given multiple
        times. For example, ``--filter-flag=name --filter-flag=addr``
        will print unique case-sensitive combinations of both name and
        address parts.

With this, the previously default behavior would now has to be spelled
as "--filter-flag=addr --filter-flag=addrfold".

I'm not sure it is wise present such a low-level interface (flags) to
command-line users, but it is hopefully more understandable now. What do
you think?

> Not to go to choose which one to choose (first, last, most common) instead
> of the suggested options these should be the ones:
>
> 1) "John Doe" <[email protected]> and "John Doe" <[email protected]>:
> only one printed, but if either were "Dr. John Doe", both of these are printed
> (this as default).

According to the above, which could be achieved as --filter-flag=name
--filter-flag=addr --filter-flag=addrfold.

> 2) same as above, but only make case-insensitive

case-insensitive is already in 1), you probably mean case-sensitive.

> address match -- i.e. in the 2 above cases in option 1, print only
> one.

This would be --filter-flag=name --filter-flag=addr.

> (and same name but different address to perhaps never been an option...)
>
> I might like to have option that does case-sensitive address match, 

This would be just --filter-flag=addr.

As a side note, it is interesting, that you mentioned your options as an
enumeration even though they are actually combinations of several on/off
flags. I think that it is more natural for human brains to think in
terms of simple lists than in terms of combinations of flags. That's why
I originally implemented --output=addresses as just another keyword,
rather than requiring the user to specify both sender and receivers.

Thanks for the review.
-Michal

> In those cases I don't know the recipient's culture and the email he
> sent to me used format <[email protected]> (and not knowing which
> one is the first and which last name (or whatever names these are) --
> just to reply in same case format in respect...


>
>
> Tomi
>
>
>> ---
>>  completion/notmuch-completion.bash |   2 +-
>>  completion/notmuch-completion.zsh  |   3 +-
>>  doc/man1/notmuch-search.rst        |  22 +++++++-
>>  notmuch-search.c                   | 100 
>> ++++++++++++++++++++++++++++++++++---
>>  test/T090-search-output.sh         |  64 ++++++++++++++++++++++++
>>  5 files changed, 182 insertions(+), 9 deletions(-)
>>
>> diff --git a/completion/notmuch-completion.bash 
>> b/completion/notmuch-completion.bash
>> index 0571dc9..c37ddf5 100644
>> --- a/completion/notmuch-completion.bash
>> +++ b/completion/notmuch-completion.bash
>> @@ -294,7 +294,7 @@ _notmuch_search()
>>          return
>>          ;;
>>      --output)
>> -        COMPREPLY=( $( compgen -W "summary threads messages files tags" -- 
>> "${cur}" ) )
>> +        COMPREPLY=( $( compgen -W "summary threads messages files tags 
>> sender recipients addresses" -- "${cur}" ) )
>>          return
>>          ;;
>>      --sort)
>> diff --git a/completion/notmuch-completion.zsh 
>> b/completion/notmuch-completion.zsh
>> index 67a9aba..bff8fd5 100644
>> --- a/completion/notmuch-completion.zsh
>> +++ b/completion/notmuch-completion.zsh
>> @@ -52,7 +52,8 @@ _notmuch_search()
>>    _arguments -s : \
>>      '--max-threads=[display only the first x threads from the search 
>> results]:number of threads to show: ' \
>>      '--first=[omit the first x threads from the search results]:number of 
>> threads to omit: ' \
>> -    '--sort=[sort results]:sorting:((newest-first\:"reverse chronological 
>> order" oldest-first\:"chronological order"))'
>> +    '--sort=[sort results]:sorting:((newest-first\:"reverse chronological 
>> order" oldest-first\:"chronological order"))' \
>> +    '--output=[select what to output]:output:((summary threads messages 
>> files tags sender recipients addresses))'
>>  }
>>
>>  _notmuch()
>> diff --git a/doc/man1/notmuch-search.rst b/doc/man1/notmuch-search.rst
>> index 90160f2..3447820 100644
>> --- a/doc/man1/notmuch-search.rst
>> +++ b/doc/man1/notmuch-search.rst
>> @@ -35,7 +35,7 @@ Supported options for **search** include
>>          intended for programs that invoke **notmuch(1)** internally. If
>>          omitted, the latest supported version will be used.
>>
>> -    ``--output=(summary|threads|messages|files|tags)``
>> +    
>> ``--output=(summary|threads|messages|files|tags|sender|recipients|addresses)``
>>
>>          **summary**
>>              Output a summary of each thread with any message matching
>> @@ -78,6 +78,26 @@ Supported options for **search** include
>>              by null characters (--format=text0), as a JSON array
>>              (--format=json), or as an S-Expression list (--format=sexp).
>>
>> +    **sender**
>> +            Output all addresses from the *From* header that appear on
>> +            any message matching the search terms, either one per line
>> +            (--format=text), separated by null characters
>> +            (--format=text0), as a JSON array (--format=json), or as
>> +            an S-Expression list (--format=sexp).
>> +
>> +        Note: Searching for **sender** should be much faster than
>> +        searching for **recipients** or **addresses**, because
>> +        sender addresses are cached directly in the database
>> +        whereas other addresses need to be fetched from message
>> +        files.
>> +
>> +    **recipients**
>> +            Like **sender** but for addresses from *To*, *Cc* and
>> +        *Bcc* headers.
>> +
>> +    **addresses**
>> +        Like **sender** and **recipients** together.
>> +
>>      ``--sort=``\ (**newest-first**\ \|\ **oldest-first**)
>>          This option can be used to present results in either
>>          chronological order (**oldest-first**) or reverse chronological
>> diff --git a/notmuch-search.c b/notmuch-search.c
>> index 5ac2a26..0614f10 100644
>> --- a/notmuch-search.c
>> +++ b/notmuch-search.c
>> @@ -23,11 +23,14 @@
>>  #include "string-util.h"
>>
>>  typedef enum {
>> -    OUTPUT_SUMMARY,
>> -    OUTPUT_THREADS,
>> -    OUTPUT_MESSAGES,
>> -    OUTPUT_FILES,
>> -    OUTPUT_TAGS
>> +    OUTPUT_SUMMARY  = 1 << 0,
>> +    OUTPUT_THREADS  = 1 << 1,
>> +    OUTPUT_MESSAGES = 1 << 2,
>> +    OUTPUT_FILES    = 1 << 3,
>> +    OUTPUT_TAGS             = 1 << 4,
>> +    OUTPUT_SENDER   = 1 << 5,
>> +    OUTPUT_RECIPIENTS       = 1 << 6,
>> +    OUTPUT_ADDRESSES        = OUTPUT_SENDER | OUTPUT_RECIPIENTS,
>>  } output_t;
>>
>>  typedef struct {
>> @@ -220,6 +223,67 @@ do_search_threads (search_options_t *o)
>>      return 0;
>>  }
>>
>> +static void
>> +print_address_list (const search_options_t *o, InternetAddressList *list)
>> +{
>> +    InternetAddress *address;
>> +    int i;
>> +
>> +    for (i = 0; i < internet_address_list_length (list); i++) {
>> +    address = internet_address_list_get_address (list, i);
>> +    if (INTERNET_ADDRESS_IS_GROUP (address)) {
>> +        InternetAddressGroup *group;
>> +        InternetAddressList *group_list;
>> +
>> +        group = INTERNET_ADDRESS_GROUP (address);
>> +        group_list = internet_address_group_get_members (group);
>> +        if (group_list == NULL)
>> +            continue;
>> +
>> +        print_address_list (o, group_list);
>> +    } else {
>> +        InternetAddressMailbox *mailbox;
>> +        const char *name;
>> +        const char *addr;
>> +        char *full_address;
>> +
>> +        mailbox = INTERNET_ADDRESS_MAILBOX (address);
>> +
>> +        name = internet_address_get_name (address);
>> +        addr = internet_address_mailbox_get_addr (mailbox);
>> +
>> +        if (name && *name)
>> +            full_address = talloc_asprintf (o->format, "%s <%s>", name, 
>> addr);
>> +        else
>> +            full_address = talloc_strdup (o->format, addr);
>> +
>> +        if (!full_address) {
>> +            fprintf (stderr, "Error: out of memory\n");
>> +            break;
>> +        }
>> +        o->format->string (o->format, full_address);
>> +        o->format->separator (o->format);
>> +
>> +        talloc_free (full_address);
>> +    }
>> +    }
>> +}
>> +
>> +static void
>> +print_address_string (const search_options_t *o, const char *recipients)
>> +{
>> +    InternetAddressList *list;
>> +
>> +    if (recipients == NULL)
>> +    return;
>> +
>> +    list = internet_address_list_parse_string (recipients);
>> +    if (list == NULL)
>> +    return;
>> +
>> +    print_address_list (o, list);
>> +}
>> +
>>  static int
>>  do_search_messages (search_options_t *o)
>>  {
>> @@ -266,11 +330,29 @@ do_search_messages (search_options_t *o)
>>
>>          notmuch_filenames_destroy( filenames );
>>
>> -    } else { /* output == OUTPUT_MESSAGES */
>> +    } else if (o->output == OUTPUT_MESSAGES) {
>>          format->set_prefix (format, "id");
>>          format->string (format,
>>                          notmuch_message_get_message_id (message));
>>          format->separator (format);
>> +    } else {
>> +        if (o->output & OUTPUT_SENDER) {
>> +            const char *addrs;
>> +
>> +            addrs = notmuch_message_get_header (message, "from");
>> +            print_address_string (o, addrs);
>> +        }
>> +
>> +        if (o->output & OUTPUT_RECIPIENTS) {
>> +            const char *hdrs[] = { "to", "cc", "bcc" };
>> +            const char *addrs;
>> +            size_t j;
>> +
>> +            for (j = 0; j < ARRAY_SIZE (hdrs); j++) {
>> +                addrs = notmuch_message_get_header (message, hdrs[j]);
>> +                print_address_string (o, addrs);
>> +            }
>> +        }
>>      }
>>
>>      notmuch_message_destroy (message);
>> @@ -370,6 +452,9 @@ notmuch_search_command (notmuch_config_t *config, int 
>> argc, char *argv[])
>>        (notmuch_keyword_t []){ { "summary", OUTPUT_SUMMARY },
>>                                { "threads", OUTPUT_THREADS },
>>                                { "messages", OUTPUT_MESSAGES },
>> +                              { "sender", OUTPUT_SENDER },
>> +                              { "recipients", OUTPUT_RECIPIENTS },
>> +                              { "addresses", OUTPUT_ADDRESSES },
>>                                { "files", OUTPUT_FILES },
>>                                { "tags", OUTPUT_TAGS },
>>                                { 0, 0 } } },
>> @@ -461,6 +546,9 @@ notmuch_search_command (notmuch_config_t *config, int 
>> argc, char *argv[])
>>      ret = do_search_threads (&o);
>>      break;
>>      case OUTPUT_MESSAGES:
>> +    case OUTPUT_SENDER:
>> +    case OUTPUT_RECIPIENTS:
>> +    case OUTPUT_ADDRESSES:
>>      case OUTPUT_FILES:
>>      ret = do_search_messages (&o);
>>      break;
>> diff --git a/test/T090-search-output.sh b/test/T090-search-output.sh
>> index 947d572..5458de1 100755
>> --- a/test/T090-search-output.sh
>> +++ b/test/T090-search-output.sh
>> @@ -387,6 +387,70 @@ cat <<EOF >EXPECTED
>>  EOF
>>  test_expect_equal_file OUTPUT EXPECTED
>>
>> +test_begin_subtest "--output=sender"
>> +notmuch search --output=sender '*' | sort | uniq --count >OUTPUT
>> +cat <<EOF >EXPECTED
>> +      1 Adrian Perez de Castro <[email protected]>
>> +      2 Alex Botero-Lowry <[email protected]>
>> +      4 Alexander Botero-Lowry <[email protected]>
>> +      1 Aron Griffis <[email protected]>
>> +     12 Carl Worth <[email protected]>
>> +      1 Chris Wilson <[email protected]>
>> +      1 François Boulogne <[email protected]>
>> +      1 Ingmar Vanhassel <[email protected]>
>> +      1 Israel Herraiz <[email protected]>
>> +      4 Jan Janak <[email protected]>
>> +      2 Jjgod Jiang <[email protected]>
>> +      7 Keith Packard <[email protected]>
>> +      5 Lars Kellogg-Stedman <[email protected]>
>> +      5 Mikhail Gusarov <[email protected]>
>> +      1 Olivier Berger <[email protected]>
>> +      1 Rolland Santimano <[email protected]>
>> +      3 Stewart Smith <[email protected]>
>> +EOF
>> +test_expect_equal_file OUTPUT EXPECTED
>> +
>> +test_begin_subtest "--output=recipients"
>> +notmuch search --output=recipients '*' | sort | uniq --count >OUTPUT
>> +cat <<EOF >EXPECTED
>> +      1 Allan McRae <[email protected]>
>> +      1 Discussion about the Arch User Repository (AUR) 
>> <[email protected]>
>> +      1 Keith Packard <[email protected]>
>> +      1 Mikhail Gusarov <[email protected]>
>> +      2 notmuch <[email protected]>
>> +     48 [email protected]
>> +      1 [email protected]
>> +EOF
>> +test_expect_equal_file OUTPUT EXPECTED
>> +
>> +test_begin_subtest "--output=addresses"
>> +notmuch search --output=addresses '*' | sort | uniq --count >OUTPUT
>> +cat <<EOF >EXPECTED
>> +      1 Adrian Perez de Castro <[email protected]>
>> +      2 Alex Botero-Lowry <[email protected]>
>> +      4 Alexander Botero-Lowry <[email protected]>
>> +      1 Allan McRae <[email protected]>
>> +      1 Aron Griffis <[email protected]>
>> +     12 Carl Worth <[email protected]>
>> +      1 Chris Wilson <[email protected]>
>> +      1 Discussion about the Arch User Repository (AUR) 
>> <[email protected]>
>> +      1 François Boulogne <[email protected]>
>> +      1 Ingmar Vanhassel <[email protected]>
>> +      1 Israel Herraiz <[email protected]>
>> +      4 Jan Janak <[email protected]>
>> +      2 Jjgod Jiang <[email protected]>
>> +      8 Keith Packard <[email protected]>
>> +      5 Lars Kellogg-Stedman <[email protected]>
>> +      6 Mikhail Gusarov <[email protected]>
>> +      1 Olivier Berger <[email protected]>
>> +      1 Rolland Santimano <[email protected]>
>> +      3 Stewart Smith <[email protected]>
>> +      2 notmuch <[email protected]>
>> +     48 [email protected]
>> +      1 [email protected]
>> +EOF
>> +test_expect_equal_file OUTPUT EXPECTED
>> +
>>  test_begin_subtest "sanitize output for quoted-printable line-breaks in 
>> author and subject"
>>  add_message "[subject]='two =?ISO-8859-1?Q?line=0A_subject?=
>>      headers'"
>> --
>> 2.1.1
>>
>> _______________________________________________
>> notmuch mailing list
>> [email protected]
>> http://notmuchmail.org/mailman/listinfo/notmuch
_______________________________________________
notmuch mailing list
[email protected]
http://notmuchmail.org/mailman/listinfo/notmuch

Re: [PATCH v2 2/4] cli: Extend the search command for --output=addresses and similar

Reply via email to