Re: sync() in java

Joe Stein Sun, 06 Nov 2011 09:16:37 -0800

yup, willing to help, sounds good.  I kind of see 3+ patches here going.

1) the initial LogHelper trait
2) scouring the existing code refactoring to new logging
3) anything patched not on trunk yet after making it to trunk patch over it
for logging changes (if any needed)
4) watchful eye moving forward setting coding guideline and fixing anything
that slips in


I will create a JIRA and am happy to get started on #1 & #2, I can help
with #3 after things show up in trunk (can just script a grep after svn
update for old style logging).

For #3 and #4 the question is more of maintence do we close the ticket
after #1 & #2 and a commiter commits the patch to trunk and then another
JIRA ticket for #3 & #4 another ticket we keep around until next release to
keep up with the "lint"

not sure if any -1 are going to show up for this if so that
would definitely put a crimp in the effort, +1 are always good here too...
maybe after the initial trait is in (#1) and I scour the existing code (#2)
we can get +1 on changing style guidelines which will make #3 and #4 be
easier moving forward and worse case if enough -1 show up it is just a
small amount of time wasted and the code is not affected.

On Sun, Nov 6, 2011 at 11:09 AM, Jay Kreps <jay.kr...@gmail.com> wrote:

> Cool, I tried a similar test:
>  https://gist.github.com/1343069
>
> Here are the results I get for logging debug to stdout and redirecting to a
> file:
>  isDebugEnabled with logging = DEBUG: 9673.7 ns per invocation
>  Debug helper with logging = DEBUG: 10248.6 ns per invocation
>  isDebugEnabled with logging = INFO: 22.3 ns per invocation
>  Debug helper with logging = INFO: 17.2 ns per invocation
>
> isDebugEnabled is the explicit check and "debug helper" is the lazy value.
> The important case is the logging = INFO case, as this is the case you are
> trying to optimize with the isDebugEnabled check. So there really is no
> difference in the debug helper case here, it looks like at least for this
> microbenchmark the jvm can optimize it away. There does seem to be some
> overhead for the helper method, as that seems to be persistently around
> 500ns slower, but I don't think that is a problem since the logging itself
> is 10us.
>
> Thanks for this pointer! I recommend we definitely move to this more
> readable style. If you are willing to help with the conversion that would
> of course be fantastic.
>
> -Jay
>
> On Sat, Nov 5, 2011 at 10:44 PM, Joe Stein <crypt...@gmail.com> wrote:
>
> > So, for the possible new log style and if it is performance hindering I
> > created some test code
> >
> >
> https://github.com/joestein/qetesh/blob/master/src/main/scala/logstyletest.scalato
> > simulate how it would act
> >
> > can be run as a standalone jar just clone the project and then in qetesh
> >
> > java -jar target/scala-2.9.1/logstyletest_2.9.1-1.0.0.0-one-jar.jar 1000
> > 100000 10
> >
> > first param is how many threads in the pool, second is how many pushes to
> > the pool and third is how many times to repeat the test (since it will be
> > bumping into things not related to the test that will slow it down since
> it
> > is contending for resources).
> >
> > running on real hardware and ec2 both on large numbers and very small I
> do
> > not see any overheard on the log style
> >
> > let me know if/how you want to proceed or any questions with the test put
> > together or if you want something else in the test also covered.   I
> would
> > be willing to sift through and make the changes for a patch if/when it
> gets
> > some +1
> >
> > /*
> > Joe Stein
> > http://www.linkedin.com/in/charmalloc
> > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > */
> >
> > On Sat, Nov 5, 2011 at 8:02 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
> >
> > > Ah, a couple interesting ideas there.
> > >
> > > For the process builder solution, I think it is too dangerous for this
> > use
> > > to pay off as an optimization. Java, being multithreaded by default,
> > uses a
> > > fork implementation that doesn't seem to use copy-on-write. This means
> > that
> > > if you have a 2GB process, you will do a 2GB allocation every time you
> > > instantiate a process before execing your 40k sync binary. This can
> cause
> > > really weird problems. We have been bitten by it in Hadoop and our job
> > > scheduler, both of which do that extensively to work around the lack of
> > > access to posix utilities. :-(
> > >
> > > Here is one of the bugs discussing this:
> > > https://issues.apache.org/jira/browse/HADOOP-5059
> > >
> > > I like your approach to logging, and I think that because you have
> > declared
> > > the type for the log statements as msg: => String they are lazy, but
> what
> > > is the overhead of that closure? I would fear it might outweigh the
> cost
> > of
> > > the string concatenation it replaces. I wonder if there is any
> > benchmarking
> > > on this? If it isn't too expensive, I would love for us to switch to
> that
> > > style of logging.
> > >
> > > -Jay
> > >
> > > On Sat, Nov 5, 2011 at 2:57 PM, Joe Stein <crypt...@gmail.com> wrote:
> > >
> > > > Yeah, right good point that won't work for you.
> > > >
> > > > How about using ProcessBuilder?  I do this for calling shell scripts
> > for
> > > > whenever I can't do something in an API (or it is already implemented
> > in
> > > > bash scripts) and just need to execute something on the system.
> > > >
> > > > some object like this
> > > >
> > > >    private val reader = actor {
> > > >        debug("created actor: " + Thread.currentThread)
> > > >        var continue = true
> > > >        loopWhile(continue){
> > > >            reactWithin(WAIT_TIME) {
> > > >                case TIMEOUT =>
> > > >                    caller ! "react timeout"
> > > >                case proc:Process =>
> > > >                    debug("entering first actor " +
> > Thread.currentThread)
> > > >                    val streamReader = new
> > > > java.io.InputStreamReader(proc.getInputStream)
> > > >                    val bufferedReader = new
> > > > java.io.BufferedReader(streamReader)
> > > >                    val stringBuilder = new java.lang.StringBuilder()
> > > >                    var line:String = null
> > > >                    while({line = bufferedReader.readLine; line !=
> > null}){
> > > >                        stringBuilder.append(line)
> > > >                        stringBuilder.append("\n")
> > > >                    }
> > > >                    bufferedReader.close
> > > >                    caller ! stringBuilder.toString
> > > >            }
> > > >        }
> > > >    }
> > > >
> > > >    def run(command: String, ran: (=> String)=>Unit) : Boolean = {
> > > >        debug("gonna runa a command: " + Thread.currentThread + " =
> > > > "+command)
> > > >        val args = command.split(" ")
> > > >        val processBuilder = new ProcessBuilder(args: _* )
> > > >        processBuilder.redirectErrorStream(true)
> > > >        val proc = processBuilder.start()
> > > >
> > > >        //Send the proc to the actor, to extract the console output.
> > > >        reader ! proc
> > > >
> > > >        //Receive the console output from the actor.
> > > >        receiveWithin(WAIT_TIME) {
> > > >                case TIMEOUT => ran("receiving Timeout")
> > > >                case result:String => ran(result)
> > > >        }
> > > >
> > > >                true
> > > >    }
> > > >
> > > >
> > > > note my debug statement is a logging trait so as to not have to if
> > > > (log.isDebugEnabled()) with by name param.  I was going to bring this
> > up
> > > > something I noticed while going through the code last night (e.g.
> > > >
> > > >
> > >
> >
> https://github.com/joestein/skeletor/blob/master/src/main/scala/util/Logging.scala
> > > > )
> > > > figure now is a good time.  Also don't have to ever do
> logger.getLogger
> > > > because that comes with the trait.
> > > >
> > > > Maybe a generic utility and then you can just call and use it for
> > sync()
> > > at
> > > > the OS level like you need?
> > > >
> > > > On Sat, Nov 5, 2011 at 5:26 PM, Jay Kreps <jay.kr...@gmail.com>
> wrote:
> > > >
> > > > > But does that sync that file or all files, looking for the later...
> > > > >
> > > > > -jay
> > > > >
> > > > > Sent from my iPhone
> > > > >
> > > > > On Nov 5, 2011, at 1:44 PM, Joe Stein <crypt...@gmail.com> wrote:
> > > > >
> > > > > > On the fileoutputstream you can get the filedescriptor using
> > getFD()
> > > > and
> > > > > then on that object you can sync()
> > > > > >
> > > > > > /*
> > > > > > Joe Stein
> > > > > > http://www.medialets.com
> > > > > > Twitter: @allthingshadoop
> > > > > > */
> > > > > >
> > > > > > On Nov 5, 2011, at 4:34 PM, Jay Kreps <jay.kr...@gmail.com>
> wrote:
> > > > > >
> > > > > >> Does anyone know if there is an equivalent to the sync() system
> > call
> > > > > >> available in Java? This is the system call that flushes all
> files.
> > > > This
> > > > > >> seems like it might be a good optimization for the time-based
> log
> > > > > flush. If
> > > > > >> you are going to sequentially flush all the log partitions
> anyway
> > it
> > > > > might
> > > > > >> be better to just do a single sync() and let the I/O scheduler
> > have
> > > > more
> > > > > >> leeway in the ordering of he writes. I know who to get the
> > > equivalent
> > > > of
> > > > > >> fsync() or fdatasync() using FileChannel.force(...) to flush a
> > > single
> > > > > file
> > > > > >> but I don't know how to get the equivalent of sync().
> > > > > >>
> > > > > >> -Jay
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > /*
> > > > Joe Stein
> > > > http://www.linkedin.com/in/charmalloc
> > > > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > > > */
> > > >
> > >
> >
>



-- 

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
*/

Re: sync() in java

Reply via email to