Hi GP,
First of all: I modified a bit the order of the lines in my sample to
this
—
MANDT;BU;IDENTIFIER;OBJNR;ADRC_ADDRNUMBER;ADRC_COUNTRY;ADRC_REGION;ADRC_POST_CODE1;ADRC_CITY1;ADRC_CITY_EXT;ADRC_CITY2;ADRC_STREET;ADRC_HOUSE_NUM1;ADRC_HOUSE_NUM2;LOKAREF_COUNTRY;LOKAREF_REGION;LOKAREF_POST_CODE1;LOKAREF_CITY1;LOKAREF_CITY_CODE;LOKAREF_CITY_EXT;LOKAREF_CITY2;LOKAREF_CITYP_CODE;LOKAREF_STREET;LOKAREF_STRT_CODE;LOKAREF_HOUSE_NUM1;LOKAREF_HOUSE_NUM2;COUNTRY_KZ;REGION_KZ;POST_CODE1_KZ;CITY1_KZ;CITY_EXT_KZ;CITY2_KZ;STREET_KZ;ADR_CHK_KZ;MSGNO;MESSAGE
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723592;DE;09;86415;Mering;;Sankt
Afra;Egerländer Straße;;;DE;09;86415;Mering;500000002795;,
Schwab;Sankt
Afra;00000006;Egerländerstraße;910011919800;;;0;0;0;0;1;0;1;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723918;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner
Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner
Straße;910001339100;;;0;0;0;0;3;0;1;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723657;DE;09;85655;Aying;;Kaps;Kaps;;;DE;09;85653;Aying;500000002262;;Kaps;00000010;Kaps;700055566100;;;0;0;1;0;3;0;0;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723878;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner
Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner
Straße;910001339100;;;0;0;0;0;3;0;1;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723658;DE;09;83083;Riedering;;Patting;Patting;;;DE;09;83083;Riedering;500000002552;b
Rosenheim, Oberbay;Patting;00000037;Pattinger
Straße;910003809300;;;0;0;0;0;1;0;1;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723674;DE;09;85655;Aying;;Großhelfendorf;Hirschbergstraße;;;DE;09;85653;Aying;500000002262;;Großhelfendorf;00000007;Hirschbergstraße;910002873200;;;0;0;1;0;3;0;0;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723908;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner
Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner
Straße;910001339100;;;0;0;0;0;3;0;1;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007724554;DE;09;95131;Schwarzenbach
a.Wald;;Schwarzenbach a
Wald;Walter-Münch-Straße;;;DE;09;95131;Schwarzenbach
a.Wald;500000011836;;Schwarzenbach
a.Wald;00000001;Walter-Münch-Straße;910007835500;;;0;0;0;0;3;1;0;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723956;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner
Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner
Straße;910001339100;;;0;0;0;0;3;0;1;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007724593;DE;09;95131;Schwarzenbach
a.Wald;;Schwarzenbach a
Wald;Walter-Münch-Straße;;;DE;09;95131;Schwarzenbach
a.Wald;500000011836;;Schwarzenbach
a.Wald;00000001;Walter-Münch-Straße;910007835500;;;0;0;0;0;3;1;0;1;;
—
in order to have the 10 records NOT already sorted.
I've built then the grep piece by piece in BBEdit’s Pattern
Playground as
you suggested
[image: CleanShot 2025-04-07 at 06.14.52.png]
and made only a minor change to the „*Replace pattern*" in order to
still
have the semicolons (see above).
The grep selects every single line of the sample data with the
exception
of the first - hurray!
That means that sorting the changed file will sort the lines as I
wanted.
After this it would only be necessary to put the columns in the
initial
order.
Now that I know for sure 😉 that the grep works I wanted to get the
„*Sort
lines…*“ also working, so I put then your grep in the „*Sort
lines…*“
again
[image: CleanShot 2025-04-07 at 06.17.42.png]
and checked also „*Sorted lines to new document*“.
As you see above the lines were still NOT sorted (see the column of
the
ADRC_POST_CODE1 marked in the screenshot above) after this…
actually they
not differ at all from the original, as comparing the two front
windows
shows:
[image: CleanShot 2025-04-07 at 06.16.43.png]
Did I still miss something?
Regards,
Vlad
Am 28.03.2025 um 19:16 schrieb GP <gp-bbed...@hotmail.com>:
Your Pattern Playground results are perplexing. Using your first
post's
example CSV data, the grep:
\d{3};\w{3};[^;]*;[^;]*;\d{10};(\w{2});(\d{2});(\d{5});([^;]*);[^;]*;([^;]*);([^;]*);([^;]*);[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;\d;([^;]*);[^\n]*
results in every line but the first column labels line matching.
To figure out what the problem might be on your system with your
local
language configuration using either BBEdit's Pattern Playground or
regex101
start out by building the grep pattern from scratch and rebuilding it
from
left to right by semicolon delineated field pattern parts. E.g.,
first
\d{3}; which should find/highlight 7 matches in each line of the
example
CSV data - second add \w{3}; for a total grep of \d{3};\w{3}; which
should
result in the leading 200;BAG; being highlighted for each line in the
example. Continue on like that until you find the next added
semicolon
delineated field pattern part fails to show a match for the left side
part
of each line in the example data. It'll be something in that line's
or
lines' field/column that isn't matching what the just add grep
pattern
part's matching criteria is.
In addition to sorting, an additional use of a working grep pattern
is
that you can also use it with BBEdit's Text -> Process Lines
Containing...
to find all lines that do NOT contain that grep pattern which will
help in
finding malformed CSV data in the large CSV data files your working
with.
On Friday, March 28, 2025 at 7:12:03 AM UTC-7 Vlad Ghitulescu
wrote:
Hey GP
I corrected the error re „Specific sub-patterns:“ but this
didn’t seem to
bring any change: The ADRC_POST_CODE1 is still not sorted
<CleanShot 2025-03-28 at 10.02.07.png>
The command gave also no recognizable sign that is ready, so I’m
not sure
that it didn’t have also problems with the line 25816, where the
CRLF
follows a house-number (see previous emails).
BBEdit’s Pattern Playground shows however that there is no result
after
searching with the regex
<CleanShot 2025-03-28 at 10.09.51.png>
I’ll take the regex to regex101 (thanks for the hint!) and see if I
could
spot an error.
Regards,
Vlad
Am 26.03.2025 um 19:42 schrieb GP <gp-bbed...@hotmail.com>:
First, in your Sort Lines dialog screenshot, you need to select the
"Specific sub-patterns:" option instead of "Entire match" in order
for the
lines to be sorted by your column sorting criteria (MSGNO,
ADRC_COUNTRY,
ADRC_REGION, ADRC_POST_CODE1, ADRC_CITY1, ADRC_CITY2, ADRC_STREET and
ADRC_HOUSE_NUM1). Since the sort lines grep pattern:
\d{3};\w{3};[^;]*;[^;]*;\d{10};(\w{2});(\d{2});(\d{5});([^;]*);[^;]*;([^;]*);([^;]*);([^;]*);[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;\d;([^;]*);[^\n]*
will match every line in your example, using the "Entire match"
option
devolves the sort into a simple whole line string sort which would
put the
MSGNO (i.e. \8 in the example) column contents last instead of first
in the
sort order. (See the "Sort Lines" section in Chapter 5 of the BBEdit
User
Manual for details of using sub-pattern sort ordering.)
With the "Entire match" option, if you look at every 2..> line the
left
part of each line is the same until you get to the part of the string
with
the ADRC_ADDRNUMBER characters so the differences in that part of the
string is Sort Line's "Entire match" is using to determine the
ordering of
the whole line strings.
Using the "Specific sub-patterns:" option is what allows you to
specify
what substring part(s) of a string/line and what composed ordering of
those
concatenated substring will be used in determining the sort ordering
between whole strings/lines.
To see what's going on with Sort Lines' "Specific sub-patterns:"
option
you can use BBEdit's Pattern Playground to see what the concatenated
substring for a line is being used to determine line sort ordering.
For
"Search pattern:" put:
\d{3};\w{3};[^;]*;[^;]*;\d{10};(\w{2});(\d{2});(\d{5});([^;]*);[^;]*;([^;]*);([^;]*);([^;]*);[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;\d;([^;]*);[^\n]*
and for "Replace pattern" put:
\8\1\2\3\4\5\6\7
and for "Contents of" chose an open example file.
As you step through each grep pattern match (using the Next button),
the
"Replacement text:" field will show you the concatenated string
composed
from the capture group ordered substring of the whole matched
string/line.
It is that "Replacement text:" string that Sort Lines uses for
"Specific
sub-patterns:" option sorting evaluation.
P.S. If an explanation of what the parts of a grep regular expression
is
specifying would help, https://regex101.com has a pretty good
explanation panel that explains what each bit of a regular expression
is
doing.
On Wednesday, March 26, 2025 at 6:24:57 AM UTC-7 Vlad Ghitulescu
wrote:
Hey GP
And thanks for the suggestion!
I tried the sort-solution before trying to understand the regex
itself 😶
I pasted into Text —> Sort Lines… like this
[image: CleanShot 2025-03-26 at 08.24.24.png]
but after Sort it doesn’t look like the postal code column was
considered
[image: CleanShot 2025-03-26 at 08.25.19.png]
Did I miss something?
Thanks again!
Regards,
Vlad
Am 25.03.2025 um 22:32 schrieb GP <gp-bbed...@hotmail.com>:
As a follow up...
BBEdit's Pattern Playground is a great help in constructing tedious
grep
patterns like you'll need for your filtering and sorting needs. The
really
tedious part is getting the field position(s) you want to filter or
sort on
so you can modify that field's match pattern to conform to the
desired
filter or sorting criteria.
For example... For your " Filter all lines that have ADR_CHK_KZ = 1"
using
Text -> Process Lines Containing ... with the grep pattern:
\d{3};\w{3};[^;]*;[^;]*;\d{10};\w{2};\d{2};\d{5};[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;(1);[^;]*;[^\n]*
will do the trick. For filtering you don't need the group capturing
on the
1 but it is useful with Pattern Playground to verify you're getting
the
right field position and field contents matched.
For your "Sort the file by MSGNO, ADRC_COUNTRY, ADRC_REGION,
ADRC_POST_CODE1, ADRC_CITY1, ADRC_CITY2, ADRC_STREET and
ADRC_HOUSE_NUM1"
using Text -> Sort Lines ... with a grep pattern of:
\d{3};\w{3};[^;]*;[^;]*;\d{10};(\w{2});(\d{2});(\d{5});([^;]*);[^;]*;([^;]*);([^;]*);([^;]*);[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;\d;([^;]*);[^\n]*
with "Specific sub-patterns" selected with \8\1\2\3\4\5\6\7 in the
fill in
field will sort your example text using your desired field ordering.
On Tuesday, March 25, 2025 at 12:53:47 PM UTC-7 GP wrote:
For filtering, look at Text -> Process Lines Containing ... and for
sorting Text -> Sort Lines ... using grep patterns to identify what
you
want to match for filtering and what subpattern field or fields you
want to
sort ordered on.
If the number of fields in your sample is representative of the real
CSV
files you're working with, it is going to be something of a pain in
the
rear coming up with the grep patterns needed to accomplish the
desired
filtering and sorting.
On Tuesday, March 25, 2025 at 11:03:35 AM UTC-7 Vlad Ghitulescu
wrote:
Hey,
I use BBEdit very often while working with big CSV-files (300 - 500
MB, up
to 4 million rows) looking like this:
MANDT;BU;IDENTIFIER;OBJNR;ADRC_ADDRNUMBER;ADRC_COUNTRY;ADRC_REGION;ADRC_POST_CODE1;ADRC_CITY1;ADRC_CITY_EXT;ADRC_CITY2;ADRC_STREET;ADRC_HOUSE_NUM1;ADRC_HOUSE_NUM2;LOKAREF_COUNTRY;LOKAREF_REGION;LOKAREF_POST_CODE1;LOKAREF_CITY1;LOKAREF_CITY_CODE;LOKAREF_CITY_EXT;LOKAREF_CITY2;LOKAREF_CITYP_CODE;LOKAREF_STREET;LOKAREF_STRT_CODE;LOKAREF_HOUSE_NUM1;LOKAREF_HOUSE_NUM2;COUNTRY_KZ;REGION_KZ;POST_CODE1_KZ;CITY1_KZ;CITY_EXT_KZ;CITY2_KZ;STREET_KZ;ADR_CHK_KZ;MSGNO;MESSAGE
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723592;DE;09;86415;Mering;;Sankt
Afra;Egerländer Straße;;;DE;09;86415;Mering;500000002795;,
Schwab;Sankt
Afra;00000006;Egerländerstraße;910011919800;;;0;0;0;0;1;0;1;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723657;DE;09;85655;Aying;;Kaps;Kaps;;;DE;09;85653;Aying;500000002262;;Kaps;00000010;Kaps;700055566100;;;0;0;1;0;3;0;0;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723658;DE;09;83083;Riedering;;Patting;Patting;;;DE;09;83083;Riedering;500000002552;b
Rosenheim, Oberbay;Patting;00000037;Pattinger
Straße;910003809300;;;0;0;0;0;1;0;1;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723674;DE;09;85655;Aying;;Großhelfendorf;Hirschbergstraße;;;DE;09;85653;Aying;500000002262;;Großhelfendorf;00000007;Hirschbergstraße;910002873200;;;0;0;1;0;3;0;0;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723878;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner
Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner
Straße;910001339100;;;0;0;0;0;3;0;1;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723908;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner
Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner
Straße;910001339100;;;0;0;0;0;3;0;1;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723918;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner
Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner
Straße;910001339100;;;0;0;0;0;3;0;1;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723956;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner
Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner
Straße;910001339100;;;0;0;0;0;3;0;1;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007724554;DE;09;95131;Schwarzenbach
a.Wald;;Schwarzenbach a
Wald;Walter-Münch-Straße;;;DE;09;95131;Schwarzenbach
a.Wald;500000011836;;Schwarzenbach
a.Wald;00000001;Walter-Münch-Straße;910007835500;;;0;0;0;0;3;1;0;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007724593;DE;09;95131;Schwarzenbach
a.Wald;;Schwarzenbach a
Wald;Walter-Münch-Straße;;;DE;09;95131;Schwarzenbach
a.Wald;500000011836;;Schwarzenbach
a.Wald;00000001;Walter-Münch-Straße;910007835500;;;0;0;0;0;3;1;0;1;;
Once in a while I’d like to filter or sort such huge files by one
or more
columns, like:
1. Filter all lines that have ADR_CHK_KZ = 1 or
2. Sort the file by MSGNO, ADRC_COUNTRY, ADRC_REGION,
ADRC_POST_CODE1,
ADRC_CITY1, ADRC_CITY2, ADRC_STREET and ADRC_HOUSE_NUM1.
Is there a way to do this sort of tasks with BBEdit?
Thanks!
Regards,
Vlad
--
This is the BBEdit Talk public discussion group. If you have a
feature
request or believe that the application isn't working correctly,
please
email "sup...@barebones.com" rather than posting here. Follow @bbedit
on
Mastodon: <https://mastodon.social/@bbedit>
---
You received this message because you are subscribed to the Google
Groups
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it,
send an
email to bbedit+un...@googlegroups.com.
To view this discussion visit
https://groups.google.com/d/msgid/bbedit/50130484-14eb-4298-b762-800f88b2c66en%40googlegroups.com
<https://groups.google.com/d/msgid/bbedit/50130484-14eb-4298-b762-800f88b2c66en%40googlegroups.com?utm_medium=email&utm_source=footer>
.
--
This is the BBEdit Talk public discussion group. If you have a
feature
request or believe that the application isn't working correctly,
please
email "sup...@barebones.com" rather than posting here. Follow @bbedit
on
Mastodon: <https://mastodon.social/@bbedit>
---
You received this message because you are subscribed to the Google
Groups
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it,
send an
email to bbedit+un...@googlegroups.com.
To view this discussion visit
https://groups.google.com/d/msgid/bbedit/3e139849-cf1a-41d8-821e-97f87cc39513n%40googlegroups.com
<https://groups.google.com/d/msgid/bbedit/3e139849-cf1a-41d8-821e-97f87cc39513n%40googlegroups.com?utm_medium=email&utm_source=footer>
.
--
This is the BBEdit Talk public discussion group. If you have a
feature
request or believe that the application isn't working correctly,
please
email "sup...@barebones.com" rather than posting here. Follow @bbedit
on
Mastodon: <https://mastodon.social/@bbedit>
---
You received this message because you are subscribed to the Google
Groups
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it,
send an
email to bbedit+un...@googlegroups.com.
To view this discussion visit
https://groups.google.com/d/msgid/bbedit/a12981c7-c81f-44cb-9f7b-3ea64cd6c602n%40googlegroups.com
<https://groups.google.com/d/msgid/bbedit/a12981c7-c81f-44cb-9f7b-3ea64cd6c602n%40googlegroups.com?utm_medium=email&utm_source=footer>
.
<CleanShot 2025-03-28 at 10.02.07.png><CleanShot 2025-03-28 at
10.09.51.png>