Re: [gentoo-user] OT: Extracting year from data, but honour empty lines

2018-05-11 Thread R0b0t1
On Fri, May 11, 2018 at 6:16 PM, Daniel Frey  wrote:
> Hi all,
>
> I am trying to do something relatively simple and I've had something
> working in the past, but my brain just doesn't want to work today.
>
> I have a text file with the following (this is just a subset of about
> 2500 dates, and I don't want to edit these all by hand if I can avoid it):
>
> --- START ---
> December 2, 1994
> March 27, 1992
> June 4, 1994
> 1993
> January 11, 1992
> January 3, 1995
>
>
> March 12, 1993
> July 12, 1991
> May 17, 1991
> August 7, 1992
> December 23, 1994
> March 27, 1992
> March 1995
> --- END ---
>
> As you can see, there's no standard in the way the date is formatted.
> Some of them are also formatted -MM-DD and MM-DD-.
>
> I have a basic grep that I tossed together:
>
> grep -o '\([0-9]\{4\}\)'
>
> This does extract the year but yields the following:
>
> 1994
> 1992
> 1994
> 1993
> 1992
> 1995
> 1993
> 1991
> 1991
> 1992
> 1994
> 1992
> 1995
>
> As you can see, the two empty lines are removed but this will cause
> problems with data not lining up later on.
>
> Does anyone have a quick tip for my tired brain to make this work and
> just output a blank line if there's no match? I swear I did this months
> ago and had something working but I apparently didn't bother saving the
> script I made. Argh!
>
> Dan
>

Use awk or perl and when the line matches the pattern ^\s*$ print a
blank line. Otherwise, apply the normal pattern.

Cheers,
 R0b0t1



Re: [gentoo-user] OT: Extracting year from data, but honour empty lines

2018-05-11 Thread Floyd Anderson


Hi Daniel,

On Fri, 11 May 2018 16:16:52 -0700
Daniel Frey  wrote:


[…]

Does anyone have a quick tip for my tired brain to make this work and
just output a blank line if there's no match? I swear I did this months
ago and had something working but I apparently didn't bother saving the
script I made. Argh!


if you can ensure there is only one four-digit year per line, try to 
strip all other line characters with:


   $ sed -e 's/.*\([0-9]\{4\}\).*/\1/' /path/to/your-date-file

while keeping none matching lines as they are. Note, pattern is always 
greedy and picks up the last year it founds.



--
Regards,
floyd




Re: [gentoo-user] OT: Extracting year from data, but honour empty lines

2018-05-11 Thread Paul Colquhoun
On Saturday, 12 May 2018 9:16:52 AM AEST Daniel Frey wrote:
> Hi all,
> 
> I am trying to do something relatively simple and I've had something
> working in the past, but my brain just doesn't want to work today.
> 
> I have a text file with the following (this is just a subset of about
> 2500 dates, and I don't want to edit these all by hand if I can avoid it):
> 
> --- START ---
> December 2, 1994
> March 27, 1992
> June 4, 1994
> 1993
> January 11, 1992
> January 3, 1995
> 
> 
> March 12, 1993
> July 12, 1991
> May 17, 1991
> August 7, 1992
> December 23, 1994
> March 27, 1992
> March 1995
> --- END ---
> 
> As you can see, there's no standard in the way the date is formatted.
> Some of them are also formatted -MM-DD and MM-DD-.
> 
> I have a basic grep that I tossed together:
> 
> grep -o '\([0-9]\{4\}\)'
> 
> This does extract the year but yields the following:
> 
> 1994
> 1992
> 1994
> 1993
> 1992
> 1995
> 1993
> 1991
> 1991
> 1992
> 1994
> 1992
> 1995
> 
> As you can see, the two empty lines are removed but this will cause
> problems with data not lining up later on.
> 
> Does anyone have a quick tip for my tired brain to make this work and
> just output a blank line if there's no match? I swear I did this months
> ago and had something working but I apparently didn't bother saving the
> script I made. Argh!
> 
> Dan


You can add an alternate regular expression that matches the blank lines, but 
the '-o' switch will still stop that match from being printed as it is an 
'empty' match. The trick is to modify the data on the fly to add a space to the 
empty lines. I have also added the '-E' switch to make the regular expression 
easier.

sed -e 's/^$/ /'  YOUR_DATA_FILE  | grep -o -E '([0-9]{4}|^[[:space:]]*$)'


-- 
Reverend Paul Colquhoun, ULC. http://andor.dropbear.id.au/
  Asking for technical help in newsgroups?  Read this first:
 http://catb.org/~esr/faqs/smart-questions.html#intro






Re: [gentoo-user] OT: Extracting year from data, but honour empty lines

2018-05-11 Thread Andrew Udvare


> On May 11, 2018, at 7:16 PM, Daniel Frey  wrote:
> 
> Hi all,
> 
> I am trying to do something relatively simple and I've had something
> working in the past, but my brain just doesn't want to work today.
> 
> I have a text file with the following (this is just a subset of about
> 2500 dates, and I don't want to edit these all by hand if I can avoid it):
> 
> --- START ---
> December 2, 1994
> March 27, 1992
> June 4, 1994
> 1993
> January 11, 1992
> January 3, 1995
> 
> 
> March 12, 1993
> July 12, 1991
> May 17, 1991
> August 7, 1992
> December 23, 1994
> March 27, 1992
> March 1995
> --- END —

While loop in Bash? This is slower but it will do it:

while IFS=$’\n’ read -r line; do
  if [ -z “$line” ]; then echo; fi
  grep -o '\([0-9]\{4\}\)’ <<< “$line”
done < input_file

I would consider using a human date string parsing library in another language, 
such as Python’s datetime where you can specify the formats, loop to check for 
any and if nothing matches output a blank line.

Andrew


[gentoo-user] OT: Extracting year from data, but honour empty lines

2018-05-11 Thread Daniel Frey
Hi all,

I am trying to do something relatively simple and I've had something
working in the past, but my brain just doesn't want to work today.

I have a text file with the following (this is just a subset of about
2500 dates, and I don't want to edit these all by hand if I can avoid it):

--- START ---
December 2, 1994
March 27, 1992
June 4, 1994
1993
January 11, 1992
January 3, 1995


March 12, 1993
July 12, 1991
May 17, 1991
August 7, 1992
December 23, 1994
March 27, 1992
March 1995
--- END ---

As you can see, there's no standard in the way the date is formatted.
Some of them are also formatted -MM-DD and MM-DD-.

I have a basic grep that I tossed together:

grep -o '\([0-9]\{4\}\)'

This does extract the year but yields the following:

1994
1992
1994
1993
1992
1995
1993
1991
1991
1992
1994
1992
1995

As you can see, the two empty lines are removed but this will cause
problems with data not lining up later on.

Does anyone have a quick tip for my tired brain to make this work and
just output a blank line if there's no match? I swear I did this months
ago and had something working but I apparently didn't bother saving the
script I made. Argh!

Dan