On Sun, 2010-06-06 at 15:37 -0500, Harry Putnam wrote:
> Brandon Vargo <brandon.va...@gmail.com> writes:
> 
> > As an example of how it works, suppose I am making a news website and
> > have a bunch of news posts, each of which has an author, category, and
> 
> Thank you brandon for such a nice through answer... Yeah, looks like
> I'm barking up the wrong tree.
> 
> I know about htdig.. Not much though.  Far as remember it didn't have
> much in the way of search interface... something like google.  Where
> as webglimpse has a rich set of search terms, including some regular
> expressions and regular expression like operators... all the same
> tools as glimpse (and agrep).  So many in fact it can be a bit
> daunting to try to become proficient with.
> 
> Maybe you can enlighten me about htdig... its been yrs since I tried
> htdig.

Sorry, it has been awhile since I have used it as well.

> Even webglimpse fails though when it comes to trying to search for
> snippets of code like perl or C etc.  No body want the sloth and cpu
> overhead of serious regular expression searching and that maybe the
> only (good) way to search for things like /,{,$,(,[,!,@ etc etc like
> one would need to find types of code snippets. Also I guess it
> would be pretty hard to build an index with that in mind.

Certainly it is a hard problem to index for arbitrary regular
expressions. Even Google's code search [1] is not terribly good at it.
However, I also do not think it is something most people will want to
do. When I go to find code that I have written, I do not remember
variable names, lines of code, etc that I can match with a regular
expression. Thus, that kind of search is pointless for me. I remember
what the code does, the project for which I wrote the code, and
approximately where the code is located within the project. I remember
function calls for libraries that I probably used. If I cannot find what
I am looking for, I use grep on the name of a function call I remember,
or I have a ctags file containing all the information I need about
function definitions.

I suggest, for code, you just organize whatever you have in a sane
directory structure. Or, even better, you can put your code in a central
place using a version control system (SVN, git, hg, CVS, etc), where it
is organized in a way that makes sense to you. After all, it sounds like
this is for your personal use, so use something that makes you happy.
Personally, I have a series of git repositories that I use to keep track
of my code and some of my documents.

> I keep thinking some good developer will come out with a tool aimed at
> websites like might be found on a home lan (in scope)... where regular
> expression searching wouldn't be so far out.
> 
> Or maybe there just is no herd of people who are competent in regular
> expression searching, and hence no audience for such a tool

I do not think the problem is a lack of people with knowledge of regular
expressions, but rather the lack of a need for such a product. Many
people, at least those I know, do not think "Oh, I want to search for
xyz; I'll write a regular expression to search for what I want across
all my data." Instead, they have a directory structure of organized
documents that makes finding that particular document or series of
documents on xyz easy. When that fails, there is the find and locate
commands for terminal users, which support regex searching in filenames,
desktop search tools such as Beagle [2], and of course grep.

Certainly it would be really nice to have a search tool that would
produce results for "show me all the code on this computer used for
validating HTTP POST requests in Python for a submitted HTML form,
preferably using Django." If you find one, let me know, as I would love
to try it. In the meantime, `grep -RE 'form|POST'
projects/python/django/project_xyz` works fairly well once I figure out
that what I want is probably in that directory. (grep -E, or egrep,
supports extended regular expression; -R is recursive) Or, I just go
search through the documentation, if available.

Maybe someone here can suggestion something better for code searching.
For everything else, use Beagle/something similar or a web-based search
engine you can install locally if you really want to be able to search
through your documents. Maybe there is something better for that too; I
do not know. I still use directories and git repositories in said
directories, where appropriate, as it is more efficient for me. Of
course your mileage may vary.

[1]: http://www.google.com/codesearch
[2]: http://beagle-project.org/

Regards,

Brandon Vargo


Reply via email to