On 4/23/24 14:32, Dale R. Worley wrote: > At least once a week, and often several times a day ...
Dear Sir : This is a task I can certainly relate to. Dragging through massive storage servers with find and grep is a terrible way to get things done. > I want to search a tree of files to list the files in a directory > containing a pattern ... That is usually the easy part of the problem. > along with the *numbers* of patterns in the files. That is not the easy part. > Usually this is because I'm looking for a file that contains a number > of instances of the pattern, from among which I will choose to copy > something. Perhaps a specific example would be helpful. Do you mean to say that you run "find" on a directory "./foo" and you search for all filenames that have a case sensitive pattern "BaR" in the filename? Then within the result set of filenames you count the instances of the string "BaR" inside the files that match? Are you only searching text files or will there be multi-lingual UTF-8 char encoded files? What about binary bit pattern match? > But often the total number of files to be examined is large, and the > total number of matches in any file might also be large. Here the word "large" can be tens of millions of files or perhaps even billions or trillions. Not sure what large means but certainly we are in the region of something possible with a decent modern server. > So "grep -r" is inconvenient, because it may return many more matches > than I want to examine, and it can be hard to see what all the > alternative files are among the large number of matches that can be > returned from any one file. > Without really understanding the problem you are trying to solve I have the sudden feeling what you really want is a custom written bit of code that walks down the directory structure and then does the read and inspection of each filename that matches some pattern. Making changes to grep for that purpose feels like making changes to a good working hammer in order to produce a chainsaw. However I am not sure what you mean by counting a "instances of the pattern". I have to guess that you want any filename with a pattern match AND twelve or fifty thousand instances of that pattern within the contents of the file. > > What do people think? > > Dale I think I want to setup an experiment and test this problem. -- Dennis Clarke RISC-V/SPARC/PPC/ARM/CISC UNIX and Linux spoken