Linux-Development-Apps Digest #384

Digestifier Mon, 07 May 2001 14:52:36 -0700
Linux-Development-Apps Digest #384, Volume #7     Mon, 7 May 01 18:13:09 EDT

Contents:
  Re: How to get a number of processors ("Norman Black")
  Accessing PCI I/O port ? ("Cédric Willot")
  Re: split a file into multi-files (Jim Cochrane)
  Faster than strstr (DB)
  Re: Accessing PCI I/O port ? ("Norm Dresner")
  Adding default libraries to every link ("Norm Dresner")
  Can not set SO_SNDTIMEO using setsockopt, anyone ? ([EMAIL PROTECTED])
  Re: How to get a number of processors (Stefaan A Eeckels)
  Re: How to get a number of processors (Stefaan A Eeckels)
  Re: How to get a number of processors (Stefaan A Eeckels)
  help! file read problem (yan zhang)

----------------------------------------------------------------------------

From: "Norman Black" <[EMAIL PROTECTED]>
Crossposted-To: comp.os.linux.development.system
Subject: Re: How to get a number of processors
Date: Mon, 7 May 2001 13:13:38 -0700
Reply-To: "Norman Black" <[EMAIL PROTECTED]>

I use  _SC_NPROCESSORS_ONLN on both Linux and Solaris.

--
Norman Black
Stony Brook Software
the reply, fubar => ix.netcom

"Greg Copeland" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]...
>
> Oddly enough, my man page for sysconf doesn't show the
_SC_NPROCESSORS_CONF
> option.  Hmm....wonder how long it's been around.  Does it exist for
Linux?
>
> Thanks,
> Greg
>
>
> Chris <[EMAIL PROTECTED]> writes:
>
> > Hong Hsu wrote:
> > >
> > >    Hi all,
> > >
> > >  Here is my quick question.  My application needs to know how many
> > > processors running in the host machine,  is there API which allows me
to
> > > get a number of processors?
> >
> > sysconf(2); /usr/include/bits/confname.h
> >
> > i.e.
> >
> > printf("%d\n", sysconf(_SC_NPROCESSORS_CONF));
> >
> > --
> > Chris Lightfoot -- chris at ex dash parrot dot com --
www.ex-parrot.com/~chris/
> >  Sometimes I lie awake at night and ask ``Why me?'' Then a voice
> >  answers ``Nothing personal, your name just happened to come up.''
> >  (Charlie Brown, from `Peanuts', Charles Schultz)
>
> --
> Greg Copeland, Principal Consultant
> Copeland Computer Consulting
> --------------------------------------------------
> PGP/GPG Key at http://www.keyserver.net
> DE5E 6F1D 0B51 6758 A5D7  7DFE D785 A386 BD11 4FCD
> --------------------------------------------------


------------------------------

From: "Cédric Willot" <[EMAIL PROTECTED]>
Subject: Accessing PCI I/O port ?
Date: Mon, 7 May 2001 22:18:46 +0200

I have an problem :-)

With cat /proc/pci, I received well the following informations about my PCI
card :

      Bus  2, device  13, function  0:
        Unknown class: Unknown vendor Unknown device (rev 1).
        Vendor id=1402. Device id=960.
        Medium devsel.  Fast back-to-back capable.  IRQ 5.
        I/O at 0xd400 [0xd401].
        I/O at 0xd800 [0xd801].
        I/O at 0xdc00 [0xdc01].

And that's the init_module of the driver's module cp380 :

int init_module (void)
{
    request_region (0xd400,1,"cp380 port A");
    request_region (0xd800,1,"cp380 port B");
    request_region (0xdc00,1,"cp380 port C");

    register_chrdev (32,'cp380",&cp380_fops);

    pcibios_write_config_byte ( 2 , 0 , PCI_BASE_ADDRESS_0 , 0xff );
}


I have connected LED's at the appropriate pins with correct alimentation and
so on ... but nothing happen ; the LED's remaining off !
P.S. : I receive no error message

Thanks in advance,

Ced.



------------------------------

From: [EMAIL PROTECTED] (Jim Cochrane)
Subject: Re: split a file into multi-files
Date: 7 May 2001 14:59:51 -0600

I'm not going to give it all away - you'll learn more if you take on some
of the challenge yourself, but here are a few tips (below).

In article <9d685r$[EMAIL PROTECTED]>,
Eric Chow <[EMAIL PROTECTED]> wrote:
>Hello,
>
>Would you please to teach me how can I split a file into different files in
>shell script?
>
>For example,
>
>index.dat
>========
>001   12
>002   25
>003   08
>004   12
>005   25
>006   08
>007   02

There are two parts to solving this: 1. understanding what specific tasks need
to be done to accomplish what you want and 2. finding the right tool(s)
to do each task and how to use the tool to do it.

The tasks that need to be done are:

1. Get a list of unique group keys (12, 25, 08, ...) - they will be
used to create the file names and to determine which group-related data
goes into which file.  We'll call the list group_keys.

2. For each key k in group_keys, associate k with all sort-keys that are
associated with k in index.dat.  A good way to do this is with a hash
table (AKA associative array) where the key is the group-key and the
associated sort keys form a list associated with the key.  We'll call this
table the group_sort (short, perhaps, for group-key/sort-key relation) table.

3. For each group key k in group_keys, list l = group_sort[k], loop:
          outfile=result-k.dat
      For each sort key j in l, loop:
             output fields 2 .. n of content.dat whose key matches j into outfile

That's it.  We'll use bash for the implementation.

For step 1, one way to do this is with sed and sort (assuming the fields
are separated by spaces):

for key in $(sed 's/.* //' index.dat|sort -u); do
        outfile=result-${$key}.dat
        process_group $key $outfile
done

process_group is a shell function:

# Usage: process_group group_key outputfile
process_group() {
        # For each index i associated with the key $1, output the data for i
        for i in $(sed -n "/$1/s/  *.*//p" index.dat); do
                output_data $i $2
        done
}

output_data is a shell function:

# Usage: output_data sort_key outputfile
output_data() {
        # Output fields 2 .. n of content.dat whose key matches $1 into $2
}

I'll leave the implementation of output_data for you to have fun with.

Note that I skipped creating the hash table in the list of tasks and
instead put the processing of sort keys for the current group key nested
in the main loop.  I did this because it seemed more implementable directly
in the shell.  However, the hash table solution would work well with the
appropriate tool - tools that come to mind that could handle this well are
awk, perl, python, or ruby.

The first sed command, as you may have guessed, uses a regular expression
to remove all characters up to the last space, leaving just the 2nd field.
The second sed command tags the line matching $1 (the passed in group key)
and removes the spaces and the 2nd field, leaving the 1st field.
I think you should be able to use a similar sed command to implement
output_data.  You might want to read up on sed and on regular expressions.

Have fun.

>
>
>content.dat
>=========
>001 aaaaaa AAAAA .....
>001 bbbbb BBBBBB .....
>001 cccccc CCCCCCC ....... some other contents
>002 .... another content for 002 ....
>002 ... 002 datas ...
>003 ....... 003 .....
>003 .. This is another 003 data ...
>004 ..... 004 ....
>005 ... 005 ...
>006 .... 006 ....
>007 ... 007 1 ...
>007 ... 007 2 ...
>
>result-12.dat
>==========
>aaaaaa AAAAA .....
>bbbbb BBBBBB .....
>cccccc CCCCCCC ....... some other contents
>..... 004 ....
>
>result-02.dat
>==========
>... 007 1 ...
>... 007 2 ...
>
>
>result-08.dat
>==========
>....... 003 .....
> .. This is another 003 data ...
>
>
>result-25.dat
>==========
>.... another content for 002 ....
>... 002 datas ...
>... 005 ...
>
>
>As the above example, there are two files. One is "index.dat" and another
>one is "content.dat". Would you please to teach me how to write a
>shell-script to produce the another 4 files, "result-12.dat",
>"result08.dat", "result-02.dat" and "result25.dat" ?
>
>In the "index.dat", the first column is a Sort-Key, and the second column is
>Group-Key. In those result files, all the contents will be the combination
>of Group-Key.
>
>Let's see the "result12.dat", we can see that the Sort-Key "001" and "004"
>contains the same Group-Key "12", so the content of the result file
>"result12.dat" contains all the lines of Sort-Key "001" and all the lines of
>Sort-Key "004".
>
>Since I am fair in shell-script, would you please to show me a simple
>example to do that ?
>
>Best regards,
>Eric
>
>
>
>
>
>
>


-- 
Jim Cochrane
[EMAIL PROTECTED]

------------------------------

Subject: Faster than strstr
From: DB <[EMAIL PROTECTED]>
Crossposted-To: comp.lang.c.moderated,comp.lang.c
Date: 07 May 2001 21:25:09 GMT

My current project for SPI Dynamics made extensive use of the strstr()
function to detect malicious activity in web server log files. The
performance of that search routine was quite a bottleneck while LogAlert
was detecting exploit strings. The strstr() function seemed to be part 
of the problem, I had to find a way to speed things up.

I found strchr() to be the fastest search available on the X86. This
became the front-end to my new search routine. The result is strchrstr(),
that uses a fast strchr() to find potential matches, and then does a
strncmp() on the remainder of the target string.

The new function is a direct replacement for strstr(). Since this
approach made quite a difference, I made another change, and did a
second character compare before the strncmp(). This is strchrstr2(),
which is NOT an exact replacement for strstr(), because the target
string must be at least 2 characters long.

TESTING:
The search space is a 9 Mb text file, searching for a nine character 
target string and looped 255 times. The tests were run on RHL 6.2 
and 7.0. The kernel builds were 2.2.14 and 2.2.16 with glibc versions
2.1.3 and 2.1.92 respectively.

 system  | strstr| strchrstr | strchrstr2 | 
==========================================| (timed with gettimeofday())
5x86-160 | 195.5 |   141.4   | 121.9      | RHL 6.2
P-166    | 97.7  |   58.9    | 52.3       | RHL 7.0
Ath-840  | 18.5  |   16.8    | 15.5       | RHL 6.2

Links to the source code for all three test programs and the alternate 
search routine are at: http://www.SPIDynamics.com/speed_search.html

/*Fast, compatible strstr() replacement.*/
char *strchrstr(char *haystack, char *needle)
{
size_t sl;

if(needle && haystack)
{
sl = strlen(needle);
if(!sl) return(haystack);
sl--;
while(1)
{
haystack = strchr(haystack, *needle);
if(haystack)
{
if(!sl) return(haystack);
if(!strncmp(haystack + 1, needle + 1, sl)) return(haystack);
haystack++;
}
else
break;
};
}
return((char *)NULL);
}

DB
[EMAIL PROTECTED]
-- 
comp.lang.c.moderated - moderation address: [EMAIL PROTECTED]

------------------------------

From: "Norm Dresner" <[EMAIL PROTECTED]>
Subject: Re: Accessing PCI I/O port ?
Date: Mon, 07 May 2001 21:25:43 GMT

Cédric Willot <[EMAIL PROTECTED]> wrote in message
news:3af70407$0$43053$[EMAIL PROTECTED]...
> I have an problem :-)
>
> With cat /proc/pci, I received well the following informations about my
PCI
> card :
>
>       Bus  2, device  13, function  0:
>         Unknown class: Unknown vendor Unknown device (rev 1).
>         Vendor id=1402. Device id=960.
>         Medium devsel.  Fast back-to-back capable.  IRQ 5.
>         I/O at 0xd400 [0xd401].
>         I/O at 0xd800 [0xd801].
>         I/O at 0xdc00 [0xdc01].
>
> And that's the init_module of the driver's module cp380 :
>
> int init_module (void)
> {
>     request_region (0xd400,1,"cp380 port A");
>     request_region (0xd800,1,"cp380 port B");
>     request_region (0xdc00,1,"cp380 port C");
>
>     register_chrdev (32,'cp380",&cp380_fops);
>
>     pcibios_write_config_byte ( 2 , 0 , PCI_BASE_ADDRESS_0 , 0xff );

I hope you don't think that this line above is writing to a port.  It's not,
it's writing to the configuration-region of the PCI card.  Specifically,
it's writing 0xFF to the first byte of the PCI_BASE_ADDRESS_0.  I doubt this
is what you want.  You might want to look up the documentation for functions
like inb() and outb().

You should also be calling check_region() before you call request_region().

        Norm

> }
>
>
> I have connected LED's at the appropriate pins with correct alimentation
and
> so on ... but nothing happen ; the LED's remaining off !
> P.S. : I receive no error message
>
> Thanks in advance,
>
> Ced.
>
>



------------------------------

From: "Norm Dresner" <[EMAIL PROTECTED]>
Subject: Adding default libraries to every link
Date: Mon, 07 May 2001 21:30:53 GMT

In other operating systems it's almost trivial to specify to the linker a
list of additional libraries that should be searched for each linking
operation.  For example, in DEC's VMS for VAX-like systems, it's nothing
more than (the equivalent of) defining additional entries in the
environment.

I haven't been able to find anything like that for Linux/gcc.  The closest
I've come is to alias the compiler-program's name, say, gcc, to something
like 'gcc -Wall -g -O2 -llocal' which does cause the linker to search the
library liblocal for each link.  But then, when all I'm doing is compiling a
module (-c option to gcc) I get a warning that the linker-library wasn't
used.

Apart from creating two different aliases, one for compiling and the other
for (compiling and) linking, is there any (reasonable) way to specify that
one or more additional libraries are to be searched?

Thanks
    Norm




------------------------------

From: [EMAIL PROTECTED]
Subject: Can not set SO_SNDTIMEO using setsockopt, anyone ?
Date: Mon, 07 May 2001 21:46:12 GMT


Can not set SO_SNDTIMEO using setsockopt, anyone ?

I am trying to set the the socket send time out option.
Every time I set it, setsockopt return a "-1".

What am I doing wrong...?

Here is the code cut out of my source:

- cut code -------------------------------------------------------

sock = socket(AF_INET, SOCK_STREAM, 0);

struct timeval tv;
bzero(&tv, sizeof(tv));
tv.tv_sec = 200;
tv.tv_usec = 0;

if (setsockopt(sockfd, SOL_SOCKET, SO_SNDTIMEO, (const void*) &tv,
sizeof(tv)) == -1) {
   exit(1);
}

- cut code -------------------------------------------------------

But it exists every time since I am getting a "-1".

Thanks in advance,

Sean.

[EMAIL PROTECTED]


------------------------------

From: [EMAIL PROTECTED] (Stefaan A Eeckels)
Subject: Re: How to get a number of processors
Crossposted-To: comp.os.linux.development.system
Date: Mon, 7 May 2001 22:53:25 +0200

In article <[EMAIL PROTECTED]>,
        John Beardmore <[EMAIL PROTECTED]> writes:
> In message <[EMAIL PROTECTED]>, Stefaan A Eeckels 
> <[EMAIL PROTECTED]> writes
>>In article <[EMAIL PROTECTED]>,
>>       [EMAIL PROTECTED] (Dave Blake) writes:
>>> Eric P. McCoy <[EMAIL PROTECTED]> wrote:
>>>
>>>> This strikes me as a battle of bad ideas: I hate writing a
>>>> text parser to deal with /proc; I don't like using nonstandard
>>>> pieces of code; and no program should ever need to know how
>>>> many processors are in a given box.  There are cases where
>>>> you'd want to use one or all of these bad ideas, but I, for
>>>> one, would need a pressing reason.
>>>
>>> Suppose I am writing a data crunching piece of software that
>>> parallelizes easily, and wish to run a thread on each processor.
>>>
>>> I first parse /proc/stat, and then crunch away with a thread
>>> on each CPU.
>>>
>>> For a web searching program, you may wish to know the number of
>>> NICs and CPUs, and take the lower of the two as the number of
>>> threads to run. And so on.
>>
>>In a Unix system, the application should not need to know
>>anything about the hardware details.
> 
> <dOGMA aLERT !>
> 
> Well then, in that case, why don't YOU start the project to make gcc 
> exploit all possible opportunities for parallelism ?
 
Maybe because I don't have a need for them?

>> The recent obsession
>>with threads violates that basic tenet.
> 
> You sound if you find threads morally objectionable as opposed to 'just 
> another way to get the job done'.

No. It's a step back from the level of hardware abstraction
that set Unix aside from the OSes of its generation. Worrying
about the differences between hardware is typically something
that needs to be done by the OS; putting it in an application
is bound to yield suboptimal, or even wrong results when the
hardware doesn't match the assumptions made by the application.
And as applications typically outlive hardware by many, many
years (and maintenance programmers couldn't bother less), it's
a bad strategy.

> 
>> If one wants to
>>squeeze the last ounce of performance from a box,
> 
> But it's not the 'last ounce' !  On some boxes it's most of the ounces !
> 
> 
>> don't
>>use an OS.
> 
> Oh balls.
> 
>      'Use an OS and a compiler that knows about parallelism'

Then you will get most of the ounces, but not the last ounce,
agreed. 

And I also agree that the OS (and associated tools) should be
capable of using the hardware to a reasonable extend, and do
so in a way that ensures that the application is not burdened
with irrelevant details such as the number of processors, the
size of the memory, or for that matter, the layout of the blocks
on a disk drive. 

COBOL has statements that allow some control over the layout
and blocking factor of a file on disk. Leaving it to the OS
wasn't an option, as OS and hardware _needed_ help in putting
the file on disk in a not too inefficient manner. Suggesting
that such should be possible on today's disks would be met with
hoots of derision. 
I'm merely suggesting that scheduling threads, worrying about
synchronization, and adapting the behaviour of an application
to the number of processors _should_ be handled by the OS, and
that the Unix approach is exactly to hide details of hardware
from the applications (everything is a file, remember?).
> 
> might be a better assertion, but pausing briefly to live in the real 
> world, C is not terribly 'parallel aware', and gcc is what most us here 
> want to work with.

Correct. And the solution is _not_ to add a syscall to Unix
to get at the number of processors, but to find a way to 
avoid having to know how many processors we've got, like we
don't want to know how fast a disk spins to decide how we'll
open a file, or how many dots per inch a printer has before
we generate PostScript.

> I don't see any moral problem with C and Linux supporting threads.  Even 
> on single CPU machines this can speed up some IO operations with 
> simultaneous reads on more than one device for example.  I really don't 
> see the point in coming over all 'closed minded' about it.

Why do you interpret my remark as "close mindedness"? Maybe
this is an area where Linux can further the state of the art,
and prove that the Unix paradigm can handle multi-processors
and threading in a betterm more portable and more transparent
way than POSIX threads and querying the number of CPUs in a box.
> 
> Now if you're going to make the number of threads equal the number of 
> processors for some crunching task, WTF is wrong with the OS making that 
> info available ??  It's only one integer and nobody's forcing YOU to use 
> it if you object on religious grounds.

Because on any multi-tasking box, the presence of persitent
service processes, and other applications, this is an example
in misguided micro-optimization. What you should tell the OS
is that you want to run as many threads as possible. And maybe
you shouldn't have to worry about threads either; indicating
what you want the OS to run simultaneously should suffice.

-- 
Stefaan
-- 
How's it supposed to get the respect of management if you've got just
one guy working on the project?  It's much more impressive to have a
battery of programmers slaving away. -- Jeffrey Hobbs (comp.lang.tcl)

------------------------------

From: [EMAIL PROTECTED] (Stefaan A Eeckels)
Subject: Re: How to get a number of processors
Crossposted-To: comp.os.linux.development.system
Date: Mon, 7 May 2001 23:25:18 +0200

In article <[EMAIL PROTECTED]>,
        Greg Copeland <[EMAIL PROTECTED]> writes:
> [EMAIL PROTECTED] (Stefaan A Eeckels) writes:
> 
>> In article <[EMAIL PROTECTED]>,
>>      Greg Copeland <[EMAIL PROTECTED]> writes:
>> > 
>> > I strongly disagree with the assertion that no one would ever need to know
>> > how many processors there are in a system.  I have worked on some large UNIX
>> 
>> Maybe the OS should provide a service to ensure that certain
>> processes get a minimum of CPU time. The approach you describe
>> is a hack, and goes completely against the basic Unix concept
>> of hiding the hardware differences and specifics from the 
>> applications. 
> 
> I *completely* disagree with you.  You are unable to get past the simple fact that
> parallel computing has requirements above and beyond simple server programming
> where you are using a pool of processes/threads or even a 1:1 process/thread per
> client.  Actually, let's talk about those for a second.  If you are no longer
> using a 1:1 model of process/thread per client, then it's safe to assume you've
> come to an scalability impasse where you decided that a pool of resources will
> scale better.  Why do you supposed that you hit the wall and was forced to change
> models?  In the above cases, you are assuming that you are the only processes with
> significant priority on it, as obviously, a heavily loaded system will, by far,
> not be servicing clients and other applications fairly.  Once you realize that
> you may need to service a mix of these types of applications, you are once again
> forced to adopt another model.  This is hardly a hack.  This is the real world.
> Now then, *my* implemention is somewhat of a hack, but simply because the
> information was not readily available.  Keep in mind, this is one such reason
> why some OS's provide facilities for process afinity which would simply allow
> a process to run, for example, on the first 4 processors and leave the other
> four free reguardless of how many children or threads the parent makes.
> 
> I spelled out, using a real world situation why such a mechanism needs to exist.
> You simply said it's a hack and violates some made up "tenet".  Please tell me
> how you would solve it.  Keep in mind this was on a project that was three years
> and 2-million overdue and long delays of yet *another* complex system would
> more than likely result in the rolling of your head and/or arse.  Even without
> such constraints, I'd like to see your magical solution.  Saying the concept
> is a hack, which is clearly required, without any supportive evidence seems
> a pretty cheap way out to me.

Hitting a raw nerve, eh --I apologize. You indeed spelled out why you needed
such a function using current technology. I'm merely pointing out that you're
doing the equivalent of having different file systems on 5.25" and 8" disks,
as good old Flex did, at the level of processors and scheduling. When you're
faced with the reality of solving a problem using a particular set of tools,
you might not think of fixing the tools before tackling the job, but given the
time and cost overruns you mention, it might not have been a totally useless
idea to consider upgrading the tools...

-- 
Stefaan
-- 
How's it supposed to get the respect of management if you've got just
one guy working on the project?  It's much more impressive to have a
battery of programmers slaving away. -- Jeffrey Hobbs (comp.lang.tcl)

------------------------------

From: [EMAIL PROTECTED] (Stefaan A Eeckels)
Subject: Re: How to get a number of processors
Crossposted-To: comp.os.linux.development.system
Date: Mon, 7 May 2001 23:27:18 +0200

In article <[EMAIL PROTECTED]>,
        John Beardmore <[EMAIL PROTECTED]> writes:
> In message <[EMAIL PROTECTED]>, Stefaan A Eeckels 
> <[EMAIL PROTECTED]> writes
> 
>>Maybe the OS should provide a service to ensure that certain
>>processes get a minimum of CPU time. The approach you describe
>>is a hack, and goes completely against the basic Unix concept
>>of hiding the hardware differences and specifics from the
>>applications.
> 
> Guaranteeing resources to processes may be a good tool, but it isn't an 
> alternative to knowing how many threads to create for a big crunching 
> job.  For that, you still need to know the number of available 
> processors.

Why not tell the OS you want the number of parallel instances
optimal for the kit and job mix on hand? 

-- 
Stefaan
-- 
How's it supposed to get the respect of management if you've got just
one guy working on the project?  It's much more impressive to have a
battery of programmers slaving away. -- Jeffrey Hobbs (comp.lang.tcl)

------------------------------

From: yan zhang <[EMAIL PROTECTED]>
Subject: help! file read problem
Date: Mon, 07 May 2001 16:53:54 -0500

I have a function called read_file_to_string.  In it, I tried to use
file->f_op->read method.  I can open the file without problem but when I
read it, the linux system crashed.


------------------------------


** FOR YOUR REFERENCE **

The service address, to which questions about the list itself and requests
to be added to or deleted from it should be directed, is:

    Internet: [EMAIL PROTECTED]

You can send mail to the entire list by posting to the
comp.os.linux.development.apps newsgroup.

Linux may be obtained via one of these FTP sites:
    ftp.funet.fi                                pub/Linux
    tsx-11.mit.edu                              pub/linux
    sunsite.unc.edu                             pub/Linux

End of Linux-Development-Apps Digest
******************************
Linux-Development-Apps Digest #384

Reply via email to